GPT

You can chat with the decentrally hosted model at chat.hypertensor.org hosting an uncensored version of Llama 3.1 8B. This model is validated in a subnet using the Decentralized Subnet Standard.

This is a beta version of a chat GPT, expect bugs.

With good GPUs, internet connection, and geography, expect between 8-20 tokens/s.

Model

GPU

Tokens Per Second

Llama 3.1 8B

NVIDIA 3070

6-15

Llama 3.1 8B

NVIDIA T4

4-8.5

Llama 3.1 8B

NVIDIA 4090

7.1-20

The tokens per second can slow down when the subnet has many clients using it and when there aren't nodes in your region due to latency.

As this is a testnet and unincentivized, most of the nodes in the subnet won't be using high-performance hardware.

Last updated 5 months ago