Configure Environments & Scaling

Environments represent different machine types available for running executions — from small CPU instances to multi-GPU servers. Organization administrators configure scaling behavior, access controls, and experimental features for each environment.

Access Environment Settings

Click Hi, <username> in the top-right corner
Select Manage <organization>
Open the Environments tab
Click on an environment to configure its settings

Core Settings

Enabled

Purpose: Control whether users can select this environment for executions.

When to disable:

Deprecated machine types being phased out
Environments under maintenance
Cost-prohibitive machines that need explicit approval before use

Effect: Disabled environments don't appear in environment selection dropdowns.

Allow Personal Usage

Purpose: Determine if users can use this environment in projects not owned by your organization.

Use cases:

Enabled: Users can experiment with personal side projects using organization infrastructure
Disabled: Restrict expensive machines to organization-owned projects only

Example: Enable for cheap CPU machines, disable for expensive multi-GPU instances.

Scale-Down Grace Period

Purpose: How many minutes to wait after the last execution completes before terminating the machine.

Default: 15 minutes

Why it matters: Starting new machines takes time (boot, Docker pull, data download). Keeping machines warm for a short period lets you launch follow-up executions without waiting for infrastructure setup.

Tuning guidance:

Short grace period (5-10 min): Cost-sensitive, infrequent workloads
Medium grace period (15-20 min): Balanced for iterative development
Long grace period (30-60 min): Rapid experimentation, pipeline workflows

Example: Set 30 minutes for GPU machines used in active development, 5 minutes for batch processing machines.

Min. Scale

Purpose: Minimum number of machines kept running at all times.

Default: 0 (no machines kept warm)

When to use:

Min. Scale > 0: Keep machines warm for latency-sensitive workflows (e.g., real-time inference, rapid prototyping)
Min. Scale = 0: Standard cost-optimized approach

Cost impact: Machines count toward billable hours even when idle.

Example: Set Min. Scale = 1 for a production inference environment to eliminate cold-start latency.

Max. Scale

Purpose: Maximum number of machines Valohai can launch in parallel.

Default: 5 machines

Considerations:

Cloud provider quotas (AWS vCPU limits, GCP GPU quotas)
Budget constraints
Concurrency needs for hyperparameter sweeps

Example: Set Max. Scale = 20 for teams running large hyperparameter sweeps, 2 for expensive multi-GPU machines.

After reaching the maximum, new executions queue until machines become available.

Per-User Quota

Purpose: Maximum machines one user can run simultaneously on this environment.

Default: 0 (no limit)

When to set:

Prevent single users from monopolizing expensive resources
Distribute limited GPU quota fairly across team members
Enforce cost controls on a per-user basis

Example: Set Per-User Quota = 2 for A100 GPU machines so no individual can consume all capacity.

Scaling Behavior

Valohai auto-scales based on queue depth:

Execution submitted → Machine not available → Valohai launches new machine → Execution starts

Grace period expires → No queued executions → Valohai terminates machine

Max scale reached → New executions queue → Start when machine becomes available

Example Scenario

Environment: aws-gpu-v100 (Max. Scale = 5, Grace Period = 15 min)

User submits 10 executions
Valohai launches 5 machines (max scale limit)
5 executions start immediately, 5 queue
As executions complete, queued jobs start on available machines
After last execution finishes, machines stay warm for 15 minutes
If no new executions submitted, machines terminate after grace period

Experimental Features

Enable Smart VM Selection

Purpose: Improve cache utilization by intelligently selecting which execution runs on which machine.

How it works: Instead of first-in-first-out queuing, workers prioritize jobs with data already cached locally. This reduces download times and accelerates startup.

When to enable:

Environments with frequently-reused large datasets
Teams running similar experiments repeatedly (e.g., model architecture sweeps)
On-premises servers with persistent cache storage

Activation time: Allow 1-2 hours for the feature to activate after enabling.

Status: Experimental — behavior may change without notice.

Enable Extended Stats: Smart VM Selection requires "Enable extended stats" to also be enabled. This collects historical execution data used for cache-aware scheduling.

Environment Types

Different environment types have different configuration needs:

Cloud Auto-Scaling (AWS, GCP, Azure)

Characteristics:

Machines launch on-demand
Scaling settings actively used
Grace periods reduce startup latency

Recommended settings:

Min. Scale = 0 (cost-optimized)
Max. Scale = Based on cloud quotas and budget
Grace Period = 15-30 min (balance cost vs convenience)

On-Premises Dispatch Workers

Characteristics:

Fixed machines (not dynamically scaled)
Scaling settings less relevant
Grace periods irrelevant (machines always running)

Recommended settings:

Min. Scale = Max. Scale = Number of physical machines
Enable Smart VM Selection for efficient cache use

Kubernetes

Characteristics:

Resource allocation via Kubernetes runtime config
Scaling settings typically not used
Kubernetes manages pod scheduling

Recommended settings:

Configured through Kubernetes manifests rather than Valohai settings

See Kubernetes Workers for detailed Kubernetes configuration.

Common Configuration Patterns

Cost-Optimized Development

Environment: aws-cpu-small
- Enabled: Yes
- Allow Personal Usage: Yes
- Scale-Down Grace Period: 10 min
- Min. Scale: 0
- Max. Scale: 10
- Per-User Quota: 0 (no limit)

Aggressive scale-down, no reserved capacity, suitable for inexpensive experimentation.

Production Inference

Environment: aws-gpu-inference
- Enabled: Yes
- Allow Personal Usage: No
- Scale-Down Grace Period: 60 min
- Min. Scale: 1
- Max. Scale: 3
- Per-User Quota: 0

Keep one machine warm for immediate response, longer grace period for burst traffic.

Expensive Multi-GPU Training

Environment: aws-p4d-24xlarge
- Enabled: Yes
- Allow Personal Usage: No
- Scale-Down Grace Period: 5 min
- Min. Scale: 0
- Max. Scale: 2
- Per-User Quota: 1

Strict limits due to high cost ($32/hour), quick scale-down, one machine per user.

Hyperparameter Sweep Environment

Environment: aws-spot-g4dn
- Enabled: Yes
- Allow Personal Usage: Yes
- Scale-Down Grace Period: 15 min
- Min. Scale: 0
- Max. Scale: 20
- Per-User Quota: 0
- Enable Smart VM Selection: Yes

High parallelism for sweeps, cache optimization for repeated experiments, spot instances for cost savings.

Team Quotas — Set per-team limits for environments
Dynamic GPU Allocation — Split GPUs on multi-GPU machines
Spot Instances — Use interruptible VMs for cost savings
Kubernetes Autoscaling — Configure Kubernetes-based scaling

PreviousEnvironments & Access Control NextTeam Quotas

Last updated 5 hours ago

Was this helpful?

Access Environment Settings

Core Settings

Enabled

Allow Personal Usage

Scale-Down Grace Period

Min. Scale

Max. Scale

Per-User Quota

Scaling Behavior

Example Scenario

Experimental Features

Enable Smart VM Selection

Environment Types

Cloud Auto-Scaling (AWS, GCP, Azure)

On-Premises Dispatch Workers

Kubernetes

Common Configuration Patterns

Cost-Optimized Development

Production Inference

Expensive Multi-GPU Training

Hyperparameter Sweep Environment

Related Topics