Running on AMD GPU
Direction from the BigScience team
September 2023, tested on 7900 XTX
Following the great instructions from August and using the docker image, this runs on the 7900 XTX with a few changes, most notably
The rest of the steps are the same
August 2023, tested on 6900 XT and 6600 XT
Due to the great work of Odonata (Discord, github @edt-xx), the hardware of oceanmasterza (Discord), and the help of epicx (Discord, GitHub @bennmann), we have the below AMD instructions.
According the the author of the bitsandbytes ROCM port @arlo-phoenix, using a Docker image is recommended (both rocm/pytorch and rocm/pytorch-nightly should work). See port discussion here.
On host machine, run:
In the running image, run:
Running the model in bfloat16
is also supported but slower than in float16
.
Multi-GPU process (--tensor_parallel_devices
) is still not tested (docker --gpu
flag may not function at this time and other virtualization tools may be necessary).
July 2023, tested on 6900 XT and 6600 XT
Contributed by: @edt-xx, @bennmann
Tested on:
AMD 6600 XT tested July 24th, 2023 on Arch Linux with Rocm 5.6.0, mesa 22.1.4
AMD 6900 XT tested April 18th, 2023 on bare metal Ubuntu 22.04 (no docker/anaconda/container). Tested with ROCM 5.4.2
Untested on 7000 series, however 7000s may have much better performance as AMD added machine learning tensor library and better hardware support (vs ray tracing only on 6000 series)
Guide:
use the mesa-clover and mesa-rusticl opencl variants
add
export HSA_OVERRIDE_GFX_VERSION=10.3.0
to your environment (put it to/home/user/.bashrc
on ubuntu - this tricks ROCM to work on more consumer based cards like the 6000 series)install ROCM. Use this tutorial for Arch Linux: https://wiki.archlinux.org/title/GPGPU
create and activate a venv for the subnet using python 3.11
python -m venv <yourvenvpath>
cd <yourvenvpath>
source bin/activate
in the venv install pytorch, nightly version, with the command generated on by the website: https://pytorch.org/get-started/locally/
install the subnet LLM template version with AMD GPU support:
This branch uses an older version of
bitsandbytes
patched to have AMD GPU support (developed by @brontoc and Titaniumtown). This means that you won't be able to use the 4-bit qunatization (--quant_type nf4
) and LoRA adapters (the--adapters
argument). The server will use 8-bit quantization (int8) for all models by default.Tip: You can set your fans to full speed or close to it before starting the subnet (the default Linux fan profile for AMD GPUs is not good on some cards):
rocm-smi --setfan 99%
run the subnet using:
Tip: You can monitor temperature and voltage by running this:
rocm-smi && rocm-smi -t
Last updated