feat(docs): Add guide Llama-CPP Linux AMD GPU support (#1782)
This commit is contained in:
		
							parent
							
								
									f0b174c097
								
							
						
					
					
						commit
						8a836e4651
					
				|  | @ -300,6 +300,40 @@ llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, co | ||||||
| AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
|  | ##### Llama-CPP Linux AMD GPU support | ||||||
|  | 
 | ||||||
|  | Linux GPU support is done through ROCm. | ||||||
|  | Some tips: | ||||||
|  | * Install ROCm from [quick-start install guide](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html) | ||||||
|  | * [Install PyTorch for ROCm](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/install-pytorch.html) | ||||||
|  | ```bash | ||||||
|  | wget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0/torch-2.1.1%2Brocm6.0-cp311-cp311-linux_x86_64.whl | ||||||
|  | poetry run pip install --force-reinstall --no-cache-dir torch-2.1.1+rocm6.0-cp311-cp311-linux_x86_64.whl | ||||||
|  | ``` | ||||||
|  | * Install bitsandbytes for ROCm | ||||||
|  | ```bash | ||||||
|  | PYTORCH_ROCM_ARCH=gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1101,gfx940,gfx941,gfx942 | ||||||
|  | BITSANDBYTES_VERSION=62353b0200b8557026c176e74ac48b84b953a854 | ||||||
|  | git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6 | ||||||
|  | cd bitsandbytes-rocm-5.6 | ||||||
|  | git checkout ${BITSANDBYTES_VERSION} | ||||||
|  | make hip ROCM_TARGET=${PYTORCH_ROCM_ARCH} ROCM_HOME=/opt/rocm/ | ||||||
|  | pip install . --extra-index-url https://download.pytorch.org/whl/nightly | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | After that running the following command in the repository will install llama.cpp with GPU support: | ||||||
|  | ```bash | ||||||
|  | LLAMA_CPP_PYTHON_VERSION=0.2.56 | ||||||
|  | DAMDGPU_TARGETS=gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942 | ||||||
|  | CMAKE_ARGS="-DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DAMDGPU_TARGETS=${DAMDGPU_TARGETS}" poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python==${LLAMA_CPP_PYTHON_VERSION} | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | If your installation was correct, you should see a message similar to the following next time you start the server `BLAS = 1`. | ||||||
|  | 
 | ||||||
|  | ``` | ||||||
|  | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
| ##### Llama-CPP Known issues and Troubleshooting | ##### Llama-CPP Known issues and Troubleshooting | ||||||
| 
 | 
 | ||||||
| Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms. | Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms. | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue