# Dynamic GPU Allocation

Split GPU resources on multi-GPU machines to run more jobs concurrently. Instead of dedicating entire machines to single executions, allocate only the GPUs each job needs.

This feature is particularly effective on on-premises servers with multiple GPUs. It's not enabled by default, your organization administrator must configure it first.

## When to Use Dynamic Allocation

**On-premises multi-GPU servers:** Run multiple 1-GPU jobs simultaneously on an 8-GPU machine instead of queuing them sequentially.

**When NOT needed:**

* **Cloud auto-scaling:** Select instance types that match your needs exactly (e.g., `p3.2xlarge` for 1 GPU, `p3.8xlarge` for 4 GPUs)
* **Kubernetes environments:** Resource allocation is handled through runtime configuration

Dynamic allocation is only available for **Virtual Machine (Dispatch)** environments running dispatch mode workers.

## Configure GPU Allocation

Set the `VH_GPUS` environment variable to specify how many GPUs your execution needs:

```shell
VH_GPUS=2
```

Your execution will wait in the queue until 2 GPUs become available on any machine in the environment.

### Set via Web UI

Add the environment variable in the execution configuration:

<figure><img src="https://4109720758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ff3mjTRQNkASbnMbJqzJ2%2Fuploads%2Fgit-blob-56fd6530347e6be05d2329ef3279513d31f39bb3%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

### Set via valohai.yaml

```yaml
- step:
    name: distributed-training
    image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
    command:
      - python train_distributed.py
    environment-variables:
      - name: VH_GPUS
        default: 4
```

> **Be careful with GPU requests.** If you request more GPUs than any single machine has, your execution will remain queued indefinitely.

## How GPU Scheduling Works

Valohai uses a **first-come, first-served** queue with intelligent prioritization:

### Priority Rules

1. **Small jobs run first:** If two executions are queued, the one requesting fewer GPUs gets priority
2. **Escalation after 1 hour:** Executions waiting longer than 1 hour get elevated priority, preventing indefinite starvation of large multi-GPU jobs

### GPU Assignment

GPUs are allocated in device index order — the same order tools like `nvidia-smi` display them:

```shell
# Your 2-GPU execution gets devices 0 and 1
nvidia-smi
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
# |   0  Tesla V100-SXM2     On   | 00000000:00:1E.0 Off |                    0 |  ← First
# |   1  Tesla V100-SXM2     On   | 00000000:00:1F.0 Off |                    0 |  ← Second
# |   2  Tesla V100-SXM2     On   | 00000000:00:20.0 Off |                    0 |
# |   3  Tesla V100-SXM2     On   | 00000000:00:21.0 Off |                    0 |
```

## Example Use Cases

### Single-GPU Training on Multi-GPU Server

Run 8 experiments simultaneously on an 8-GPU machine:

```yaml
- step:
    name: single-gpu-experiment
    image: tensorflow/tensorflow:latest-gpu
    command:
      - python train.py
    environment-variables:
      - name: VH_GPUS
        default: 1
```

Launch 8 executions — they'll all run in parallel instead of queuing.

### Multi-GPU Distributed Training

Reserve 4 GPUs for a single distributed training job:

```yaml
- step:
    name: distributed-training
    image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
    command:
      - torchrun --nproc_per_node=4 train_distributed.py
    environment-variables:
      - name: VH_GPUS
        default: 4
```

### Mixed Workload Scheduling

Queue both small and large jobs efficiently:

```shell
# Small job (1 GPU) - runs immediately
vh exec run quick-test --adhoc VH_GPUS=1

# Large job (4 GPUs) - waits for 4 GPUs to free up
vh exec run full-training --adhoc VH_GPUS=4

# Another small job (1 GPU) - runs before large job if submitted within 1 hour
vh exec run another-test --adhoc VH_GPUS=1
```

After 1 hour, the 4-GPU job escalates in priority and will run next, even if more 1-GPU jobs are queued.

## Monitoring GPU Utilization

Track how effectively you're using GPU resources:

* [Hardware Statistics](https://docs.valohai.com/observability/resource-monitoring/hardware-statistics) — Real-time GPU utilization during execution
* [Visualize Utilization](https://docs.valohai.com/observability/resource-monitoring/visualize-utilization) — Historical GPU usage patterns
* [Track Underutilization](https://docs.valohai.com/observability/resource-monitoring/track-underutilization) — Identify over-allocated GPUs

## Related Topics

* [Tasks & Parallel Execution](https://docs.valohai.com/tasks) — Run hyperparameter sweeps with dynamic GPU allocation
* [Distributed Training](https://docs.valohai.com/distributed-training) — Coordinate multi-GPU training across executions
* [Team Quotas](https://docs.valohai.com/user-and-organization-management/environments-and-access-control/team-quotas) — Limit concurrent GPU usage per team
