Troubleshooting

This page lists common errors and ways to address them.

Before starting your subnet validator node, ensure the following are true:


  1. I get this error: hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others on WSL. What should I do?

    All clocks on all nodes need to be synchronized. Please set the date using an NTP server:

    sudo apt install ntpdate
    sudo ntpdate pool.ntp.org
  2. The server starts loading blocks and then prints: Killed. What should I do?

    This happens since Windows doesn't allocate much RAM to WSL by default, so the server gets OOM-killed.

    To increase the memory limit, go to C:/Users/username and create the .wslconfig with this contents:

    [wsl2]
    memory=12GB

    Then reboot WSL (run sudo reboot in the WSL console) and it should work fine.

  3. I get this error: torch.cuda.OutOfMemoryError: CUDA out of memory. What should I do?

    If you use an Anaconda env, run this before starting the server:

    export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128

    If you use Docker, add this argument after --rm in the Docker command:

    -e "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128"
  4. WSL clock tends to get out of synch, which prevents the server from launching with the error hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others.

    To sync the WSL clock run sudo ntpdate pool.ntp.org. See more fixes discussed at stackverflow.

Last updated