You can allocate a variable number of GPUs to each execution. This will allow you to efficiently use the GPU resources on an on-premise machine with multiple GPUs and run more jobs concurrently. This feature is not enabled by default. Your organization administrator needs to set up this feature first before it can be used.
Dynamic resources are effective only on specific environment types
Dynamic resource controls are available, or even needed, only in specific environments.
For example, when using auto-scaling cloud machines, you can select an instance type to match your needs from the get-go so further splitting the resources is not necessary.
In a Kubernetes environment, the resources are managed by runtime configuration, so the resources do not need to be defined separately.
Environments of Virtual Machine (Dispatch)
type i.e., dispatch mode environments running dispatch mode workers, allow users to specify how many GPUs each execution should use instead of relying on the environment selection (most environments) or runtime configuration (Kubernetes environments) settings.
Setups like this allow, for example, more efficient resource utilization of on-premises GPUs without an additional scheduling layer like Kubernetes.
Setting the number of GPUs
You specify the number of GPUs through the VH_GPUS
environment variable on the execution.
VH_GPUS=2
This means that your execution will be allocated 2 GPUs on one of the machines in the environment. It will remain in the queue until there are 2 GPUs available on any of the machines in the environment.
This works through all the usual interfaces, be it web UI, CLI, API or pre-defined in the Valohai YAML.
Be careful with the number of GPUs
Be careful with the number of GPUs you request. If you request more GPUs than are available on any of the machines on the environment, your execution will remain in the queue indefinitely.
Dynamic GPUs default to 1
If you don’t specify the VH_GPUS
environment variable on a GPU-machine environment, the execution will default to 1 GPU.
VH_GPUS=0
is not a valid value and will likewise default to 1 GPU.
How are the GPUs allocated?
First come, first served, in general.
If there is a case where multiple executions are in the queue at the same time, workers will prioritize smaller GPU workloads by default. I.e., if two executions are queued, one requesting a single GPU and another requesting 2 GPUs, the workers will pick the single GPU execution first.
Each execution has an internal escalation timeout of 1 hour. If the execution has not been allocated the requested resources within the hour, workers will start deprioritizing smaller workloads until the escalated execution is allocated.
On the worker level, GPUs are allocated in GPU device index order. This would be the same order that tools like nvidia-smi
would list the GPUs.