Configure Environments & Scaling

Environments represent different machine types available for running executions — from small CPU instances to multi-GPU servers. Organization administrators configure scaling behavior, access controls, and experimental features for each environment.

Access Environment Settings

  1. Click Hi, <username> in the top-right corner

  2. Select Manage <organization>

  3. Open the Environments tab

  4. Click on an environment to configure its settings

Core Settings

Enabled

Purpose: Control whether users can select this environment for executions.

When to disable:

  • Deprecated machine types being phased out

  • Environments under maintenance

  • Cost-prohibitive machines that need explicit approval before use

Effect: Disabled environments don't appear in environment selection dropdowns.

Allow Personal Usage

Purpose: Determine if users can use this environment in projects not owned by your organization.

Use cases:

  • Enabled: Users can experiment with personal side projects using organization infrastructure

  • Disabled: Restrict expensive machines to organization-owned projects only

Example: Enable for cheap CPU machines, disable for expensive multi-GPU instances.

Scale-Down Grace Period

Purpose: How many minutes to wait after the last execution completes before terminating the machine.

Default: 15 minutes

Why it matters: Starting new machines takes time (boot, Docker pull, data download). Keeping machines warm for a short period lets you launch follow-up executions without waiting for infrastructure setup.

Tuning guidance:

  • Short grace period (5-10 min): Cost-sensitive, infrequent workloads

  • Medium grace period (15-20 min): Balanced for iterative development

  • Long grace period (30-60 min): Rapid experimentation, pipeline workflows

Example: Set 30 minutes for GPU machines used in active development, 5 minutes for batch processing machines.

Min. Scale

Purpose: Minimum number of machines kept running at all times.

Default: 0 (no machines kept warm)

When to use:

  • Min. Scale > 0: Keep machines warm for latency-sensitive workflows (e.g., real-time inference, rapid prototyping)

  • Min. Scale = 0: Standard cost-optimized approach

Cost impact: Machines count toward billable hours even when idle.

Example: Set Min. Scale = 1 for a production inference environment to eliminate cold-start latency.

Max. Scale

Purpose: Maximum number of machines Valohai can launch in parallel.

Default: 5 machines

Considerations:

  • Cloud provider quotas (AWS vCPU limits, GCP GPU quotas)

  • Budget constraints

  • Concurrency needs for hyperparameter sweeps

Example: Set Max. Scale = 20 for teams running large hyperparameter sweeps, 2 for expensive multi-GPU machines.

After reaching the maximum, new executions queue until machines become available.

Per-User Quota

Purpose: Maximum machines one user can run simultaneously on this environment.

Default: 0 (no limit)

When to set:

  • Prevent single users from monopolizing expensive resources

  • Distribute limited GPU quota fairly across team members

  • Enforce cost controls on a per-user basis

Example: Set Per-User Quota = 2 for A100 GPU machines so no individual can consume all capacity.

Scaling Behavior

Valohai auto-scales based on queue depth:

Execution submitted → Machine not available → Valohai launches new machine → Execution starts

Grace period expires → No queued executions → Valohai terminates machine

Max scale reached → New executions queue → Start when machine becomes available

Example Scenario

Environment: aws-gpu-v100 (Max. Scale = 5, Grace Period = 15 min)

  1. User submits 10 executions

  2. Valohai launches 5 machines (max scale limit)

  3. 5 executions start immediately, 5 queue

  4. As executions complete, queued jobs start on available machines

  5. After last execution finishes, machines stay warm for 15 minutes

  6. If no new executions submitted, machines terminate after grace period

Experimental Features

Enable Smart VM Selection

Purpose: Improve cache utilization by intelligently selecting which execution runs on which machine.

How it works: Instead of first-in-first-out queuing, workers prioritize jobs with data already cached locally. This reduces download times and accelerates startup.

When to enable:

  • Environments with frequently-reused large datasets

  • Teams running similar experiments repeatedly (e.g., model architecture sweeps)

  • On-premises servers with persistent cache storage

Activation time: Allow 1-2 hours for the feature to activate after enabling.

Status: Experimental — behavior may change without notice.

Enable Extended Stats: Smart VM Selection requires "Enable extended stats" to also be enabled. This collects historical execution data used for cache-aware scheduling.

Environment Types

Different environment types have different configuration needs:

Cloud Auto-Scaling (AWS, GCP, Azure)

Characteristics:

  • Machines launch on-demand

  • Scaling settings actively used

  • Grace periods reduce startup latency

Recommended settings:

  • Min. Scale = 0 (cost-optimized)

  • Max. Scale = Based on cloud quotas and budget

  • Grace Period = 15-30 min (balance cost vs convenience)

On-Premises Dispatch Workers

Characteristics:

  • Fixed machines (not dynamically scaled)

  • Scaling settings less relevant

  • Grace periods irrelevant (machines always running)

Recommended settings:

  • Min. Scale = Max. Scale = Number of physical machines

  • Enable Smart VM Selection for efficient cache use

Kubernetes

Characteristics:

  • Resource allocation via Kubernetes runtime config

  • Scaling settings typically not used

  • Kubernetes manages pod scheduling

Recommended settings:

  • Configured through Kubernetes manifests rather than Valohai settings

See Kubernetes Workers for detailed Kubernetes configuration.

Common Configuration Patterns

Cost-Optimized Development

Environment: aws-cpu-small
- Enabled: Yes
- Allow Personal Usage: Yes
- Scale-Down Grace Period: 10 min
- Min. Scale: 0
- Max. Scale: 10
- Per-User Quota: 0 (no limit)

Aggressive scale-down, no reserved capacity, suitable for inexpensive experimentation.

Production Inference

Environment: aws-gpu-inference
- Enabled: Yes
- Allow Personal Usage: No
- Scale-Down Grace Period: 60 min
- Min. Scale: 1
- Max. Scale: 3
- Per-User Quota: 0

Keep one machine warm for immediate response, longer grace period for burst traffic.

Expensive Multi-GPU Training

Environment: aws-p4d-24xlarge
- Enabled: Yes
- Allow Personal Usage: No
- Scale-Down Grace Period: 5 min
- Min. Scale: 0
- Max. Scale: 2
- Per-User Quota: 1

Strict limits due to high cost ($32/hour), quick scale-down, one machine per user.

Hyperparameter Sweep Environment

Environment: aws-spot-g4dn
- Enabled: Yes
- Allow Personal Usage: Yes
- Scale-Down Grace Period: 15 min
- Min. Scale: 0
- Max. Scale: 20
- Per-User Quota: 0
- Enable Smart VM Selection: Yes

High parallelism for sweeps, cache optimization for repeated experiments, spot instances for cost savings.

Last updated

Was this helpful?