Configure Environments & Scaling
Environments represent different machine types available for running executions — from small CPU instances to multi-GPU servers. Organization administrators configure scaling behavior, access controls, and experimental features for each environment.
Access Environment Settings
Click Hi, <username> in the top-right corner
Select Manage <organization>
Open the Environments tab
Click on an environment to configure its settings
Core Settings
Enabled
Purpose: Control whether users can select this environment for executions.
When to disable:
Deprecated machine types being phased out
Environments under maintenance
Cost-prohibitive machines that need explicit approval before use
Effect: Disabled environments don't appear in environment selection dropdowns.
Allow Personal Usage
Purpose: Determine if users can use this environment in projects not owned by your organization.
Use cases:
Enabled: Users can experiment with personal side projects using organization infrastructure
Disabled: Restrict expensive machines to organization-owned projects only
Example: Enable for cheap CPU machines, disable for expensive multi-GPU instances.
Scale-Down Grace Period
Purpose: How many minutes to wait after the last execution completes before terminating the machine.
Default: 15 minutes
Why it matters: Starting new machines takes time (boot, Docker pull, data download). Keeping machines warm for a short period lets you launch follow-up executions without waiting for infrastructure setup.
Tuning guidance:
Short grace period (5-10 min): Cost-sensitive, infrequent workloads
Medium grace period (15-20 min): Balanced for iterative development
Long grace period (30-60 min): Rapid experimentation, pipeline workflows
Example: Set 30 minutes for GPU machines used in active development, 5 minutes for batch processing machines.
Min. Scale
Purpose: Minimum number of machines kept running at all times.
Default: 0 (no machines kept warm)
When to use:
Min. Scale > 0: Keep machines warm for latency-sensitive workflows (e.g., real-time inference, rapid prototyping)
Min. Scale = 0: Standard cost-optimized approach
Cost impact: Machines count toward billable hours even when idle.
Example: Set Min. Scale = 1 for a production inference environment to eliminate cold-start latency.
Max. Scale
Purpose: Maximum number of machines Valohai can launch in parallel.
Default: 5 machines
Considerations:
Cloud provider quotas (AWS vCPU limits, GCP GPU quotas)
Budget constraints
Concurrency needs for hyperparameter sweeps
Example: Set Max. Scale = 20 for teams running large hyperparameter sweeps, 2 for expensive multi-GPU machines.
After reaching the maximum, new executions queue until machines become available.
Per-User Quota
Purpose: Maximum machines one user can run simultaneously on this environment.
Default: 0 (no limit)
When to set:
Prevent single users from monopolizing expensive resources
Distribute limited GPU quota fairly across team members
Enforce cost controls on a per-user basis
Example: Set Per-User Quota = 2 for A100 GPU machines so no individual can consume all capacity.
Scaling Behavior
Valohai auto-scales based on queue depth:
Execution submitted → Machine not available → Valohai launches new machine → Execution starts
Grace period expires → No queued executions → Valohai terminates machine
Max scale reached → New executions queue → Start when machine becomes available
Example Scenario
Environment: aws-gpu-v100 (Max. Scale = 5, Grace Period = 15 min)
User submits 10 executions
Valohai launches 5 machines (max scale limit)
5 executions start immediately, 5 queue
As executions complete, queued jobs start on available machines
After last execution finishes, machines stay warm for 15 minutes
If no new executions submitted, machines terminate after grace period
Experimental Features
Enable Smart VM Selection
Purpose: Improve cache utilization by intelligently selecting which execution runs on which machine.
How it works: Instead of first-in-first-out queuing, workers prioritize jobs with data already cached locally. This reduces download times and accelerates startup.
When to enable:
Environments with frequently-reused large datasets
Teams running similar experiments repeatedly (e.g., model architecture sweeps)
On-premises servers with persistent cache storage
Activation time: Allow 1-2 hours for the feature to activate after enabling.
Status: Experimental — behavior may change without notice.
Enable Extended Stats: Smart VM Selection requires "Enable extended stats" to also be enabled. This collects historical execution data used for cache-aware scheduling.
Environment Types
Different environment types have different configuration needs:
Cloud Auto-Scaling (AWS, GCP, Azure)
Characteristics:
Machines launch on-demand
Scaling settings actively used
Grace periods reduce startup latency
Recommended settings:
Min. Scale = 0 (cost-optimized)
Max. Scale = Based on cloud quotas and budget
Grace Period = 15-30 min (balance cost vs convenience)
On-Premises Dispatch Workers
Characteristics:
Fixed machines (not dynamically scaled)
Scaling settings less relevant
Grace periods irrelevant (machines always running)
Recommended settings:
Min. Scale = Max. Scale = Number of physical machines
Enable Smart VM Selection for efficient cache use
Kubernetes
Characteristics:
Resource allocation via Kubernetes runtime config
Scaling settings typically not used
Kubernetes manages pod scheduling
Recommended settings:
Configured through Kubernetes manifests rather than Valohai settings
See Kubernetes Workers for detailed Kubernetes configuration.
Common Configuration Patterns
Cost-Optimized Development
Environment: aws-cpu-small
- Enabled: Yes
- Allow Personal Usage: Yes
- Scale-Down Grace Period: 10 min
- Min. Scale: 0
- Max. Scale: 10
- Per-User Quota: 0 (no limit)Aggressive scale-down, no reserved capacity, suitable for inexpensive experimentation.
Production Inference
Environment: aws-gpu-inference
- Enabled: Yes
- Allow Personal Usage: No
- Scale-Down Grace Period: 60 min
- Min. Scale: 1
- Max. Scale: 3
- Per-User Quota: 0Keep one machine warm for immediate response, longer grace period for burst traffic.
Expensive Multi-GPU Training
Environment: aws-p4d-24xlarge
- Enabled: Yes
- Allow Personal Usage: No
- Scale-Down Grace Period: 5 min
- Min. Scale: 0
- Max. Scale: 2
- Per-User Quota: 1Strict limits due to high cost ($32/hour), quick scale-down, one machine per user.
Hyperparameter Sweep Environment
Environment: aws-spot-g4dn
- Enabled: Yes
- Allow Personal Usage: Yes
- Scale-Down Grace Period: 15 min
- Min. Scale: 0
- Max. Scale: 20
- Per-User Quota: 0
- Enable Smart VM Selection: YesHigh parallelism for sweeps, cache optimization for repeated experiments, spot instances for cost savings.
Related Topics
Team Quotas — Set per-team limits for environments
Dynamic GPU Allocation — Split GPUs on multi-GPU machines
Spot Instances — Use interruptible VMs for cost savings
Kubernetes Autoscaling — Configure Kubernetes-based scaling
Last updated
Was this helpful?
