Dynamic GPU Allocation

Split GPU resources on multi-GPU machines to run more jobs concurrently. Instead of dedicating entire machines to single executions, allocate only the GPUs each job needs.

This feature is particularly effective on on-premises servers with multiple GPUs. It's not enabled by default, your organization administrator must configure it first.

When to Use Dynamic Allocation

On-premises multi-GPU servers: Run multiple 1-GPU jobs simultaneously on an 8-GPU machine instead of queuing them sequentially.

When NOT needed:

  • Cloud auto-scaling: Select instance types that match your needs exactly (e.g., p3.2xlarge for 1 GPU, p3.8xlarge for 4 GPUs)

  • Kubernetes environments: Resource allocation is handled through runtime configuration

Dynamic allocation is only available for Virtual Machine (Dispatch) environments running dispatch mode workers.

Configure GPU Allocation

Set the VH_GPUS environment variable to specify how many GPUs your execution needs:

VH_GPUS=2

Your execution will wait in the queue until 2 GPUs become available on any machine in the environment.

Set via Web UI

Add the environment variable in the execution configuration:

Set via valohai.yaml

- step:
    name: distributed-training
    image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
    command:
      - python train_distributed.py
    environment-variables:
      - name: VH_GPUS
        default: 4

Be careful with GPU requests. If you request more GPUs than any single machine has, your execution will remain queued indefinitely.

How GPU Scheduling Works

Valohai uses a first-come, first-served queue with intelligent prioritization:

Priority Rules

  1. Small jobs run first: If two executions are queued, the one requesting fewer GPUs gets priority

  2. Escalation after 1 hour: Executions waiting longer than 1 hour get elevated priority, preventing indefinite starvation of large multi-GPU jobs

GPU Assignment

GPUs are allocated in device index order — the same order tools like nvidia-smi display them:

# Your 2-GPU execution gets devices 0 and 1
nvidia-smi
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
# |   0  Tesla V100-SXM2     On   | 00000000:00:1E.0 Off |                    0 |  ← First
# |   1  Tesla V100-SXM2     On   | 00000000:00:1F.0 Off |                    0 |  ← Second
# |   2  Tesla V100-SXM2     On   | 00000000:00:20.0 Off |                    0 |
# |   3  Tesla V100-SXM2     On   | 00000000:00:21.0 Off |                    0 |

Example Use Cases

Single-GPU Training on Multi-GPU Server

Run 8 experiments simultaneously on an 8-GPU machine:

- step:
    name: single-gpu-experiment
    image: tensorflow/tensorflow:latest-gpu
    command:
      - python train.py
    environment-variables:
      - name: VH_GPUS
        default: 1

Launch 8 executions — they'll all run in parallel instead of queuing.

Multi-GPU Distributed Training

Reserve 4 GPUs for a single distributed training job:

- step:
    name: distributed-training
    image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
    command:
      - torchrun --nproc_per_node=4 train_distributed.py
    environment-variables:
      - name: VH_GPUS
        default: 4

Mixed Workload Scheduling

Queue both small and large jobs efficiently:

# Small job (1 GPU) - runs immediately
vh exec run quick-test --adhoc VH_GPUS=1

# Large job (4 GPUs) - waits for 4 GPUs to free up
vh exec run full-training --adhoc VH_GPUS=4

# Another small job (1 GPU) - runs before large job if submitted within 1 hour
vh exec run another-test --adhoc VH_GPUS=1

After 1 hour, the 4-GPU job escalates in priority and will run next, even if more 1-GPU jobs are queued.

Monitoring GPU Utilization

Track how effectively you're using GPU resources:

Last updated

Was this helpful?