# Configure Environments & Scaling

Environments represent different machine types available for running executions — from small CPU instances to multi-GPU servers. Organization administrators configure scaling behavior, access controls, and experimental features for each environment.

## Access Environment Settings

1. Click **Hi, \<username>** in the top-right corner
2. Select **Manage \<organization>**
3. Open the **Environments** tab
4. Click on an environment to configure its settings

## Core Settings

### Enabled

**Purpose:** Control whether users can select this environment for executions.

**When to disable:**

* Deprecated machine types being phased out
* Environments under maintenance
* Cost-prohibitive machines that need explicit approval before use

**Effect:** Disabled environments don't appear in environment selection dropdowns.

### Allow Personal Usage

**Purpose:** Determine if users can use this environment in projects not owned by your organization.

**Use cases:**

* **Enabled:** Users can experiment with personal side projects using organization infrastructure
* **Disabled:** Restrict expensive machines to organization-owned projects only

**Example:** Enable for cheap CPU machines, disable for expensive multi-GPU instances.

### Scale-Down Grace Period

**Purpose:** How many minutes to wait after the last execution completes before terminating the machine.

**Default:** 15 minutes

**Why it matters:** Starting new machines takes time (boot, Docker pull, data download). Keeping machines warm for a short period lets you launch follow-up executions without waiting for infrastructure setup.

**Tuning guidance:**

* **Short grace period (5-10 min):** Cost-sensitive, infrequent workloads
* **Medium grace period (15-20 min):** Balanced for iterative development
* **Long grace period (30-60 min):** Rapid experimentation, pipeline workflows

**Example:** Set 30 minutes for GPU machines used in active development, 5 minutes for batch processing machines.

### Min. Scale

**Purpose:** Minimum number of machines kept running at all times.

**Default:** 0 (no machines kept warm)

**When to use:**

* **Min. Scale > 0:** Keep machines warm for latency-sensitive workflows (e.g., real-time inference, rapid prototyping)
* **Min. Scale = 0:** Standard cost-optimized approach

**Cost impact:** Machines count toward billable hours even when idle.

**Example:** Set Min. Scale = 1 for a production inference environment to eliminate cold-start latency.

### Max. Scale

**Purpose:** Maximum number of machines Valohai can launch in parallel.

**Default:** 5 machines

**Considerations:**

* Cloud provider quotas (AWS vCPU limits, GCP GPU quotas)
* Budget constraints
* Concurrency needs for hyperparameter sweeps

**Example:** Set Max. Scale = 20 for teams running large hyperparameter sweeps, 2 for expensive multi-GPU machines.

After reaching the maximum, new executions queue until machines become available.

### Per-User Quota

**Purpose:** Maximum machines one user can run simultaneously on this environment.

**Default:** 0 (no limit)

**When to set:**

* Prevent single users from monopolizing expensive resources
* Distribute limited GPU quota fairly across team members
* Enforce cost controls on a per-user basis

**Example:** Set Per-User Quota = 2 for A100 GPU machines so no individual can consume all capacity.

## Scaling Behavior

Valohai auto-scales based on queue depth:

**Execution submitted → Machine not available → Valohai launches new machine → Execution starts**

**Grace period expires → No queued executions → Valohai terminates machine**

**Max scale reached → New executions queue → Start when machine becomes available**

### Example Scenario

**Environment:** `aws-gpu-v100` (Max. Scale = 5, Grace Period = 15 min)

1. User submits 10 executions
2. Valohai launches 5 machines (max scale limit)
3. 5 executions start immediately, 5 queue
4. As executions complete, queued jobs start on available machines
5. After last execution finishes, machines stay warm for 15 minutes
6. If no new executions submitted, machines terminate after grace period

## Experimental Features

### Enable Smart VM Selection

**Purpose:** Improve cache utilization by intelligently selecting which execution runs on which machine.

**How it works:** Instead of first-in-first-out queuing, workers prioritize jobs with data already cached locally. This reduces download times and accelerates startup.

**When to enable:**

* Environments with frequently-reused large datasets
* Teams running similar experiments repeatedly (e.g., model architecture sweeps)
* On-premises servers with persistent cache storage

**Activation time:** Allow 1-2 hours for the feature to activate after enabling.

**Status:** Experimental — behavior may change without notice.

> **Enable Extended Stats:** Smart VM Selection requires "Enable extended stats" to also be enabled. This collects historical execution data used for cache-aware scheduling.

## Environment Types

Different environment types have different configuration needs:

### Cloud Auto-Scaling (AWS, GCP, Azure)

**Characteristics:**

* Machines launch on-demand
* Scaling settings actively used
* Grace periods reduce startup latency

**Recommended settings:**

* Min. Scale = 0 (cost-optimized)
* Max. Scale = Based on cloud quotas and budget
* Grace Period = 15-30 min (balance cost vs convenience)

### On-Premises Dispatch Workers

**Characteristics:**

* Fixed machines (not dynamically scaled)
* Scaling settings less relevant
* Grace periods irrelevant (machines always running)

**Recommended settings:**

* Min. Scale = Max. Scale = Number of physical machines
* Enable Smart VM Selection for efficient cache use

### Kubernetes

**Characteristics:**

* Resource allocation via Kubernetes runtime config
* Scaling settings typically not used
* Kubernetes manages pod scheduling

**Recommended settings:**

* Configured through Kubernetes manifests rather than Valohai settings

See [Kubernetes Workers](https://docs.valohai.com/installation-and-setup/kubernetes) for detailed Kubernetes configuration.

## Common Configuration Patterns

### Cost-Optimized Development

```
Environment: aws-cpu-small
- Enabled: Yes
- Allow Personal Usage: Yes
- Scale-Down Grace Period: 10 min
- Min. Scale: 0
- Max. Scale: 10
- Per-User Quota: 0 (no limit)
```

Aggressive scale-down, no reserved capacity, suitable for inexpensive experimentation.

### Production Inference

```
Environment: aws-gpu-inference
- Enabled: Yes
- Allow Personal Usage: No
- Scale-Down Grace Period: 60 min
- Min. Scale: 1
- Max. Scale: 3
- Per-User Quota: 0
```

Keep one machine warm for immediate response, longer grace period for burst traffic.

### Expensive Multi-GPU Training

```
Environment: aws-p4d-24xlarge
- Enabled: Yes
- Allow Personal Usage: No
- Scale-Down Grace Period: 5 min
- Min. Scale: 0
- Max. Scale: 2
- Per-User Quota: 1
```

Strict limits due to high cost ($32/hour), quick scale-down, one machine per user.

### Hyperparameter Sweep Environment

```
Environment: aws-spot-g4dn
- Enabled: Yes
- Allow Personal Usage: Yes
- Scale-Down Grace Period: 15 min
- Min. Scale: 0
- Max. Scale: 20
- Per-User Quota: 0
- Enable Smart VM Selection: Yes
```

High parallelism for sweeps, cache optimization for repeated experiments, spot instances for cost savings.

## Related Topics

* [Team Quotas](https://docs.valohai.com/user-and-organization-management/environments-and-access-control/team-quotas) — Set per-team limits for environments
* [Dynamic GPU Allocation](https://docs.valohai.com/executions/advanced-features/dynamic-gpu-allocation) — Split GPUs on multi-GPU machines
* [Spot Instances](https://docs.valohai.com/executions/advanced-features/spot-instances) — Use interruptible VMs for cost savings
* [Kubernetes Autoscaling](https://docs.valohai.com/installation-and-setup/kubernetes/kubernetes-autoscaling) — Configure Kubernetes-based scaling
