Testnet Tensor
  • Introduction
  • Explorer
  • GPT
  • DSN Dashboard
  • Create Account
    • Wallet
    • CLI
    • Faucet
  • Hypertensor CLI
    • Hypertensor CLI
  • Run A Subnet Node
    • Getting Started
    • Wiki
      • Running on AMD GPU
      • Running on Windows Using WSL
      • Troubleshooting
    • Generate Keypair
    • Register & Stake
    • Add
    • Start Validator Node
    • Start Bootstrap Node
    • Activate
    • Update Delegate Reward Rate
    • Deactivate
    • Remove
    • Keys
  • Delegate Staking
    • Introduction
    • Add Delegate Stake
    • Transfer Delegate Stake
    • Remove Delegate Stake
    • Claim Delegate Stake
  • Node Delegate Staking
    • Introduction
    • Add Node Delegate Stake
    • Transfer Node Delegate Stake
    • Remove Node Delegate Stake
    • Claim Node Delegate Stake
  • Delegate Staking Utils
    • Introduction
    • Subnet to Node
    • Node to Subnet
  • Build a Subnet
    • Introduction
    • DSN Standard
    • Subnet Consensus Protocol (SCP)
      • Incentives
      • Accounting
      • Proposals
    • Subnet
      • Registration
      • Activation
      • Deactivation
    • Subnet Nodes
      • Registration
      • Activation
      • Deactivate
  • Contribute
Powered by GitBook
On this page
  • September 2023, tested on 7900 XTX
  • August 2023, tested on 6900 XT and 6600 XT
  • July 2023, tested on 6900 XT and 6600 XT
  1. Run A Subnet Node
  2. Wiki

Running on AMD GPU

Direction from the BigScience team

PreviousWikiNextRunning on Windows Using WSL

Last updated 4 months ago

September 2023, tested on 7900 XTX

Following the great instructions from August and using the docker image, this runs on the 7900 XTX with a few changes, most notably

export HSA_OVERRIDE_GFX_VERSION=11.0.0 #7900 xtx natively works with the gfx1100 driver
make hip ROCM_TARGET=gfx1100

The rest of the steps are the same

August 2023, tested on 6900 XT and 6600 XT

Due to the great work of Odonata (Discord, github ), the hardware of oceanmasterza (Discord), and the help of epicx (Discord, GitHub ), we have the below AMD instructions.

According the the author of the port , using a Docker image is recommended (both and should work). See port discussion .

On host machine, run:

docker pull rocm/pytorch-nightly
sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/pytorch-nightly

In the running image, run:

cd /home
export HSA_OVERRIDE_GFX_VERSION=10.3.0

# Install bitsandbytes with ROCM support
git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6.git bitsandbytes
cd bitsandbytes
make hip ROCM_TARGET=gfx1030
pip install pip --upgrade
pip install .

# Install Subnet
cd ..
pip install --upgrade git+https://github.com/hypertensor-blockchain/dsn

# Run server
python -m subnet.cli.run_server MODEL_URL --port <an open port> --torch_dtype float16

Running the model in bfloat16 is also supported but slower than in float16.

Multi-GPU process (--tensor_parallel_devices) is still not tested (docker --gpu flag may not function at this time and other virtualization tools may be necessary).

July 2023, tested on 6900 XT and 6600 XT

Tested on:

  • AMD 6600 XT tested July 24th, 2023 on Arch Linux with Rocm 5.6.0, mesa 22.1.4

  • AMD 6900 XT tested April 18th, 2023 on bare metal Ubuntu 22.04 (no docker/anaconda/container). Tested with ROCM 5.4.2

  • Untested on 7000 series, however 7000s may have much better performance as AMD added machine learning tensor library and better hardware support (vs ray tracing only on 6000 series)

Guide:

  • use the mesa-clover and mesa-rusticl opencl variants

  • add export HSA_OVERRIDE_GFX_VERSION=10.3.0 to your environment (put it to /home/user/.bashrc on ubuntu - this tricks ROCM to work on more consumer based cards like the 6000 series)

  • create and activate a venv for the subnet using python 3.11

    • python -m venv <yourvenvpath>

    • cd <yourvenvpath>

    • source bin/activate

  • install the subnet LLM template version with AMD GPU support:

    pip install git+https://github.com/hypertensor-blockchain/dsn@amd-gpus

    Tip: You can set your fans to full speed or close to it before starting the subnet (the default Linux fan profile for AMD GPUs is not good on some cards): rocm-smi --setfan 99%

  • run the subnet using:

    python -m subnet.cli.run_server MODEL_NAME

    Tip: You can monitor temperature and voltage by running this: rocm-smi && rocm-smi -t

Contributed by: ,

install ROCM. Use this tutorial for Arch Linux:

in the venv install pytorch, nightly version, with the command generated on by the website:

This branch uses an older version of bitsandbytes to have AMD GPU support (developed by and ). This means that you won't be able to use the 4-bit qunatization (--quant_type nf4) and LoRA adapters (the --adapters argument). The server will use 8-bit quantization (int8) for all models by default.

@edt-xx
@bennmann
bitsandbytes ROCM
@arlo-phoenix
rocm/pytorch
rocm/pytorch-nightly
here
@edt-xx
@bennmann
https://wiki.archlinux.org/title/GPGPU
https://pytorch.org/get-started/locally/
patched
@brontoc
Titaniumtown