# Dynamic GPU Allocation

Split GPU resources on multi-GPU machines to run more jobs concurrently. Instead of dedicating entire machines to single executions, allocate only the GPUs each job needs.

This feature is particularly effective on on-premises servers with multiple GPUs. It's not enabled by default, your organization administrator must configure it first.

## When to Use Dynamic Allocation

**On-premises multi-GPU servers:** Run multiple 1-GPU jobs simultaneously on an 8-GPU machine instead of queuing them sequentially.

**When NOT needed:**

* **Cloud auto-scaling:** Select instance types that match your needs exactly (e.g., `p3.2xlarge` for 1 GPU, `p3.8xlarge` for 4 GPUs)
* **Kubernetes environments:** Resource allocation is handled through runtime configuration

Dynamic allocation is only available for **Virtual Machine (Dispatch)** environments running dispatch mode workers.

## Configure GPU Allocation

Set the `VH_GPUS` environment variable to specify how many GPUs your execution needs:

```shell
VH_GPUS=2
```

Your execution will wait in the queue until 2 GPUs become available on any machine in the environment.

### Set via Web UI

Add the environment variable in the execution configuration:

<figure><img src="/files/hJZmdaFhipaj19N48iBH" alt=""><figcaption></figcaption></figure>

### Set via valohai.yaml

```yaml
- step:
    name: distributed-training
    image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
    command:
      - python train_distributed.py
    environment-variables:
      - name: VH_GPUS
        default: 4
```

> **Be careful with GPU requests.** If you request more GPUs than any single machine has, your execution will remain queued indefinitely.

## How GPU Scheduling Works

Valohai uses a **first-come, first-served** queue with intelligent prioritization:

### Priority Rules

1. **Small jobs run first:** If two executions are queued, the one requesting fewer GPUs gets priority
2. **Escalation after 1 hour:** Executions waiting longer than 1 hour get elevated priority, preventing indefinite starvation of large multi-GPU jobs

### GPU Assignment

GPUs are allocated in device index order — the same order tools like `nvidia-smi` display them:

```shell
# Your 2-GPU execution gets devices 0 and 1
nvidia-smi
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
# |   0  Tesla V100-SXM2     On   | 00000000:00:1E.0 Off |                    0 |  ← First
# |   1  Tesla V100-SXM2     On   | 00000000:00:1F.0 Off |                    0 |  ← Second
# |   2  Tesla V100-SXM2     On   | 00000000:00:20.0 Off |                    0 |
# |   3  Tesla V100-SXM2     On   | 00000000:00:21.0 Off |                    0 |
```

## Example Use Cases

### Single-GPU Training on Multi-GPU Server

Run 8 experiments simultaneously on an 8-GPU machine:

```yaml
- step:
    name: single-gpu-experiment
    image: tensorflow/tensorflow:latest-gpu
    command:
      - python train.py
    environment-variables:
      - name: VH_GPUS
        default: 1
```

Launch 8 executions — they'll all run in parallel instead of queuing.

### Multi-GPU Distributed Training

Reserve 4 GPUs for a single distributed training job:

```yaml
- step:
    name: distributed-training
    image: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
    command:
      - torchrun --nproc_per_node=4 train_distributed.py
    environment-variables:
      - name: VH_GPUS
        default: 4
```

### Mixed Workload Scheduling

Queue both small and large jobs efficiently:

```shell
# Small job (1 GPU) - runs immediately
vh exec run quick-test --adhoc VH_GPUS=1

# Large job (4 GPUs) - waits for 4 GPUs to free up
vh exec run full-training --adhoc VH_GPUS=4

# Another small job (1 GPU) - runs before large job if submitted within 1 hour
vh exec run another-test --adhoc VH_GPUS=1
```

After 1 hour, the 4-GPU job escalates in priority and will run next, even if more 1-GPU jobs are queued.

## Monitoring GPU Utilization

Track how effectively you're using GPU resources:

* [Hardware Statistics](/observability/resource-monitoring/hardware-statistics.md) — Real-time GPU utilization during execution
* [Visualize Utilization](/observability/resource-monitoring/visualize-utilization.md) — Historical GPU usage patterns
* [Track Underutilization](/observability/resource-monitoring/track-underutilization.md) — Identify over-allocated GPUs

## Related Topics

* [Tasks & Parallel Execution](/tasks.md) — Run hyperparameter sweeps with dynamic GPU allocation
* [Distributed Training](/distributed-training.md) — Coordinate multi-GPU training across executions
* [Team Quotas](/user-and-organization-management/environments-and-access-control/team-quotas.md) — Limit concurrent GPU usage per team


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/executions/advanced-features/dynamic-gpu-allocation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
