Dynamic GPU Allocation
Split GPU resources on multi-GPU machines to run more jobs concurrently. Instead of dedicating entire machines to single executions, allocate only the GPUs each job needs.
This feature is particularly effective on on-premises servers with multiple GPUs. It's not enabled by default, your organization administrator must configure it first.
When to Use Dynamic Allocation
On-premises multi-GPU servers: Run multiple 1-GPU jobs simultaneously on an 8-GPU machine instead of queuing them sequentially.
When NOT needed:
Cloud auto-scaling: Select instance types that match your needs exactly (e.g.,
p3.2xlargefor 1 GPU,p3.8xlargefor 4 GPUs)Kubernetes environments: Resource allocation is handled through runtime configuration
Dynamic allocation is only available for Virtual Machine (Dispatch) environments running dispatch mode workers.
Configure GPU Allocation
Set the VH_GPUS environment variable to specify how many GPUs your execution needs:
VH_GPUS=2Your execution will wait in the queue until 2 GPUs become available on any machine in the environment.
Set via Web UI
Add the environment variable in the execution configuration:

Set via valohai.yaml
Be careful with GPU requests. If you request more GPUs than any single machine has, your execution will remain queued indefinitely.
How GPU Scheduling Works
Valohai uses a first-come, first-served queue with intelligent prioritization:
Priority Rules
Small jobs run first: If two executions are queued, the one requesting fewer GPUs gets priority
Escalation after 1 hour: Executions waiting longer than 1 hour get elevated priority, preventing indefinite starvation of large multi-GPU jobs
GPU Assignment
GPUs are allocated in device index order — the same order tools like nvidia-smi display them:
Example Use Cases
Single-GPU Training on Multi-GPU Server
Run 8 experiments simultaneously on an 8-GPU machine:
Launch 8 executions — they'll all run in parallel instead of queuing.
Multi-GPU Distributed Training
Reserve 4 GPUs for a single distributed training job:
Mixed Workload Scheduling
Queue both small and large jobs efficiently:
After 1 hour, the 4-GPU job escalates in priority and will run next, even if more 1-GPU jobs are queued.
Monitoring GPU Utilization
Track how effectively you're using GPU resources:
Hardware Statistics — Real-time GPU utilization during execution
Visualize Utilization — Historical GPU usage patterns
Track Underutilization — Identify over-allocated GPUs
Related Topics
Tasks & Parallel Execution — Run hyperparameter sweeps with dynamic GPU allocation
Distributed Training — Coordinate multi-GPU training across executions
Team Quotas — Limit concurrent GPU usage per team
Last updated
Was this helpful?
