<h1 align="center">
<a href="https://prompts.chat">
**Important:**
Sign in to like and favorite skills
Important:
reddit comments but will also eventually be outdated again
# 1 install WSL2 on Windows 11, then: sudo apt update sudo apt-get install build-essential sudo apt install git -y # optional: install a better terminal experience, otherwise skip to step 4 # 2 install brew /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" (echo; echo 'eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"') >> /home/$USER/.bashrc eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)" brew doctor # 3 install oh-my-posh brew install jandedobbeleer/oh-my-posh/oh-my-posh $(brew --prefix oh-my-posh)/themes # copy the path and add it below to the second eval line: sudo nano ~/.bashrc # add this to the end: # eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)" # eval "$(oh-my-posh init bash --config '/home/linuxbrew/.linuxbrew/opt/oh-my-posh/themes/atomic.omp.json')" # plugins=( # git # # other plugins # ) # CTRL+X to end editing # Y to save changes # ENTER to finally exit source ~/.bashrc exec bash # 4 install mamba instead of conda, because it's faster https://mamba.readthedocs.io/en/latest/installation.html mkdir github mkdir downloads cd downloads wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh bash Mambaforge-$(uname)-$(uname -m).sh # 5 install the correct cuda toolkit 11.7, not 12.x wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run sudo sh cuda_11.7.0_515.43.04_linux.run naon ~/.bashrc # add the following line, in order to add the cuda library to the environment variable # export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH # after the plugins=() code block, above conda initialize # CTRL+X to end editing # Y to save changes # ENTER to finally exit source ~/.bashrc cd .. # 6 install ooba's textgen mamba create --name textgen python=3.10.9 mamba activate textgen pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio -f https://download.pytorch.org/whl/cu117/torch_stable.html git clone https://github.com/oobabooga/text-generation-webui cd text-generation-webui pip install -r requirements.txt # 7 Install 4bit support through GPTQ-for-LLaMa mkdir repositories cd repositories # choose ONE of the following: # A) for fast triton https://www.reddit.com/r/LocalLLaMA/comments/13g8v5q/fastest_inference_branch_of_gptqforllama_and/ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b fastest-inference-4bit # B) for triton git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b triton # C) for newer cuda git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda # D) for widely compatible old cuda git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda # groupsize, act-order, true-sequential # --act-order (quantizing columns in order of decreasing activation size) # --true-sequential (performing sequential quantization even within a single Transformer block) # Those fix GPTQ's strangely bad performance on the 7B model (from 7.15 to 6.09 Wiki2 PPL) and lead to slight improvements on most models/settings in general. # --groupsize # Currently, groupsize and act-order do not work together and you must choose one of them. # Ooba: There is a pytorch branch from qwop, that allows you to use groupsize and act-order together. # Models without group-size (better for the 7b model) # Models with group-size (better from 13b upwards) cd GPTQ-for-LLaMa pip install -r requirements.txt python setup_cuda.py install cd .. cd .. # 8 Test ooba with a 4bit GPTQ model python download-model.py 4bit/WizardLM-13B-Uncensored-4bit-128g python server.py --wbits 4 --model_type llama --groupsize 128 --chat # 9 install llama.cpp cd repositories git clone https://github.com/ggerganov/llama.cpp cd llama.cpp nano ~/.bashrc # add the cuda bin folder to the path environment variable in order for make to find nvcc: # export PATH=/usr/local/cuda/bin:$PATH # after the export LD_LIBRARY_PATH line # CTRL+X to end editing # Y to save changes # ENTER to finally exit source ~/.bashrc make LLAMA_CUBLAS=1 cd models wget https://huggingface.co/TheBloke/WizardLM-13B-Uncensored-GGML/resolve/main/wizardLM-13B-Uncensored.ggmlv3.q4_0.bin cd .. # 10 test llama.cpp with GPU support ./main -t 8 -m models/wizardLM-13B-Uncensored.ggmlv3.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: write a story about llamas ### Response:" --n-gpu-layers 30 cd .. cd .. # 11 prepare ooba's textgen for llama.cpp support, by compiling llama-cpp-python with cuda GPU support pip uninstall -y llama-cpp-python CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir
Installation guide from 2023-03-01 (outdated)
wsl --install You may be prompted to restart your computer. If so, save your work and restart.sudo apt updatesudo apt upgradesudo apt install gitsudo apt install wgetmkdir downloadscd downloads/wget https://repo.anaconda.com/archive/Anaconda3-2023.03-1-Linux-x86_64.shchmod +x ./Anaconda3-2022.05-Linux-x86_64.sh./Anaconda3-2022.05-Linux-x86_64.sh and follow the defaultssudo apt install build-essentialcd ..conda create -n textgen python=3.10.9conda activate textgenpip3 install torch torchvision torchaudiomkdir githubcd githubgit clone https://github.com/oobabooga/text-generation-webuicd text-generation-webuipip install -r requirements.txtpip install chardet cchardetIf you want to try the triton branch, skip to Newer GPTQ-Triton
--groupsize 128 --wbits 4 and no-act-order modelsmkdir repositoriescd repositoriesgit clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda (or try the newer https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda build)cd GPTQ-for-LLaMapython -m pip install -r requirements.txtpython setup_cuda.py install if this gives an error about g++, try installing the correct g++ version: conda install -y -k gxx_linux-64=11.2.0cd ../..This triton branch or this one:
--groupsize 128 --wbits 4 flags and act-order modelsmkdir repositoriescd repositoriesconda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidiagit clone https://github.com/qwopqwop200/GPTQ-for-LLaMa (or try https://github.com/fpgaminer/GPTQ-triton)cd GPTQ-for-LLaMapip install -r requirements.txtcd ../..Alternatively you can try AutoGPTQ to install cuda, older llama-cuda, or triton variants:
pip install auto-gptq to install cuda branch for newer modelspip install auto-gptq[llama] if your transformers is outdated or you are using older models that don't support itpip install auto-gptq[triton] to install triton branch for triton compatible modelscd ../..If you want to open the webui from within your home network, enable port forwarding on your windows machine, with this command in an administrator terminal:
netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=7860 connectaddress=localhost connectport=7860
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib before running the sever.py belowpip install -i https://test.pypi.org/simple/ bitsandbytes-cuda113Allows for faster, but non-deterministic inference. Optional:
pip install xformers--xformers flag later, when running the server.py belowYou're done with the Ubuntu / WSL2 installation, you can skip to Download models section.
Anaconda Prompt (Miniconda 3) from the Start MenuC:\Users\yourusername>mkdir githubcd githubconda create --name textgen python=3.10conda activate textgenconda install pipconda install -y -k pytorch[version=2,build=py3.10_cuda11.7*] torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit ninja git -c pytorch -c nvidia/label/cuda-11.7.0 -c nvidiagit clone https://github.com/oobabooga/text-generation-webui.gitpython -m pip install https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whlcd text-generation-webuipip install -r requirements.txt --upgrademkdir repositoriescd repositoriesgit clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cudapython -m pip install -r requirements.txtpython setup_cuda.py install might fail, continue with the next command if sopip install https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/raw/main/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl skip this command, if the previous one didn't failcd ..\..\..\ (go back to text-generation-webui)pip install faust-cchardetpip install chardetpython download-model.pyTheBloke/wizardLM-7B-GGML and let it download the model filesThe base command to run. You have to add further flags, depending on the model and environment you want to run in:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib always, before running the server.pypython server.py --model-menu --chat--model-menu to allow the change of models in the UI--chat loads the chat instead of the text completion UI--wbits 4 loads a 4-bit quantized model--groupsize 128 if the model specifies groupsize, add this parameter--model_type llama if the model name is unknown, specify it's base model. if you run llama derrived models like vicuna, alpaca, gpt4-x, codecapybara or wizardLM you have to define it as llama. If you load OPT or GPT-J models, define the flag accordingly--xformers if you have properly installed xformers and want faster but nondeterministic answer generationIf you get a
cuda lib not found error, especially on Windows WSL2 Ubuntu, try executing export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib before running the server.py above
pip install faust-cchardetpip install chardetor the other way around. Then try to start the server again.
On Windows Native, try:
pip uninstall bitsandbytespip install git+https://github.com/Keith-Hon/bitsandbytes-windows.gitOr try these prebuilt wheel on windows:
Still having problems, try to manually copy the libraries
On Linux or Windows WSL2 Ubuntu, try:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib before running the server.py every time!pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda113 and see if it works without the above commandpip install xformers==0.0.16rc425Use llama.cpp, HN discussion
See an up to date list of most models you can run locally: awesome-ai open-models
See the awesome-ai LLM section for more tools, GUIs etc.