# System Environment Variables

Valohai automatically sets environment variables in every execution to provide context about your job and control agent behavior.

## Execution Context Variables

These variables give you information about the current execution:

```shell
# Execution paths
VH_CONFIG_DIR=/valohai/config           # Configuration files
VH_INPUTS_DIR=/valohai/inputs           # Downloaded input files
VH_OUTPUTS_DIR=/valohai/outputs         # Files to upload
VH_REPOSITORY_DIR=/valohai/repository   # Git repository code (working directory)

# Execution identifiers
VH_JOB_ID=exec-016eb6ec-50cb-0031-3f48-d556e47b1c78       # Job UUID
VH_EXECUTION_ID=016eb6ec-50cb-0031-3f48-d556e47b1c78      # Execution ID
VH_PROJECT_ID=04a37c09-dbe1-4c01-b715-0a3223c50188        # Project ID

# Pipeline context (when applicable)
VH_TASK_ID=f9c97759-513e-44a1-9666-97cf198cde80           # Task ID
VH_PIPELINE_ID=f403603b-ad11-4cc4-a90d-3118f51c8dcd       # Pipeline ID
VH_PIPELINE_NODE_ID=972834e2-23b5-429a-9f6d-80b8c4a75c8a  # Pipeline node ID

# TPU
VH_TPU= # Contains the GRPC endpoint(s) of the allocated Cloud TPU(s), separated by spaces (when TPUs are available)

# Distributed tasks context
VH_DIST_MEMBER_ID=abc123dfg # The name of this execution in the distributed execution group
VH_DIST_MEMBER_INDEX=1 # Zero-based index of this execution in the distributed execution group
VH_DIST_MEMBER_COUNT=3 # Number of executions in the distributed execution group

# Cloud instance IPs
VH_PUBLIC_IP=13.53.65.91    # Public IP (falls back to private if unavailable)
VH_LOCAL_IP=172.16.0.1      # Local/private IP address of cloud instance

```

> 💡 *Use these paths to read inputs and write outputs in your code instead of hardcoding locations.*

## Agent Behavior Controls

Set these variables to modify how the Valohai agent handles your execution.

> :exclamation: These must be defined in your **step definition** to take effect.
>
> Environment variables set via `export` in `commands` section only work inside the container and wont' affect agent behavior.

### **Interactive Terminal**

#### VH\_INTERACTIVE

Enables the use of [Interactive Terminal](https://docs.valohai.com/development-and-debugging/interactive-terminal).

```shellscript
VH_INTERACTIVE=1     # true values: 1, yes, true
```

### **Data**

#### **VH\_NO\_DATA\_CACHE**

Ignore cached input data and re-download from source.

```bash
VH_NO_DATA_CACHE=1      # true values: 1, yes, true
```

Useful when you've reused URLs but the underlying data has changed.

#### **VH\_NO\_OUTPUT\_CACHE**

Prevent caching of produced output files.

```shellscript
VH_NO_OUTPUT_CACHE=1     # true values: 1, yes, true
```

Prevents caching when your execution produces large outputs that won't be reused.\
This saves disk space for future executions.

#### **VH\_CLEAN**

Remove all Docker images and cached data before and after execution

```bash
VH_CLEAN=true     # true values: 1, yes, true
```

This increases execution time but ensures a clean environment.

#### VH\_NO\_INPUT\_HASHING

Whether to skip hashing input files. This can speed up initialization of the execution, at the expense of\
not having the data hashed for integrity checking.

```shellscript
VH_NO_INPUT_HASHING=1     # true values: 1, yes, true
```

#### VH\_NO\_OUTPUT\_HASHING

Whether to skip hashing output files. This can speed up finalization of the execution, at the expense of\
not having the data hashed for integrity checking.

```shellscript
VH_NO_OUTPUT_HASHING=1     # true values: 1, yes, true
```

#### VH\_RENAME\_DUPLICATES

Whether to rename input files with duplicate filenames to avoid conflicts. Clashing filenames will be suffixed with an underscore and a counting number like `file.txt`, `file_2.txt`, etc.

```shellscript
VH_RENAME_DUPLICATES=1     # true values: 1, yes, true
```

#### VH\_ENABLE\_DATASET\_VERSION\_PACKAGING

Whether to enable the dataset version packaging feature. Default to false.

```shellscript
VH_ENABLE_DATASET_VERSION_PACKAGING=1     # true values: 1, yes, true
```

### **Configuration Files**

Each [configuration file](https://docs.valohai.com/executions/system-configuration-files) will be written in both .**json** and .**yaml** format.\
In case you don't need the .yaml version of these files (their advantage is being a bit more human readable but the machine won't mind json either), you can disable their writing with:

```shellscript
VH_YAML_CONFIG_FILES=0     # false values: 0, no, false
```

> :bulb: `/valohai/config/inputs.yaml` will contain details about each requested file. In case your execution requests a large amount of files (>20k), generating and writing the .yaml version of this configuration file might take a while (even a few minutes).\
> If your execution does not rely on the .yaml version of this file, feel free to disable it and speed up the execution startup.

### **Logging**

#### VH\_INPUT\_LOGGING

During the input download phase, Valohai will log status of each requested file (downloaded/found-in-cache/on-demand). These messages could clutter the logs and make it harder to inspect the rest of the logs.\
There are three possible options (thus values for this environment variable):

* **`VH_INPUT_LOGGING=enable`**

  This is the default value and it allows logging status of each input file.
* **`VH_INPUT_LOGGING=disable`**

  Suppress all input processing logs
* **`VH_INPUT_LOGGING=file`**

  Write logs to `/tmp/peon/runs/exec-<execution-id>/input_processing.log` and show minimal logs in the UI (occasionally number of downloaded files and errors).

  :exclamation: Note that this file is written on the path on the machine, and not inside the execution's container, therefore it's not accessible from within the execution.

### **Shared Cache**

> :exclamation: **Currently applicable only in Kubernetes environments**

When additional (shared) cache layers are used, it's expected that these will be remote/network file systems.\
When files are found in local cache, the ones that are requested by the execution, Valohai will expose to the execution container by creating **hard-links**. This is not possible when files are found on remote/network file system.\
\
There are two possible behaviors, controlled by the next environment variables:

* **Copying** each file from remote file system - Execution will access actual files but this may take longer than the regular download (depending on the amount of files)
* Creating **sym-links** - Execution will start sooner but will have to access data via sym-links. Even though this is generally "safe", it may cause misbehavior of some programs.

#### VH\_ALLOW\_INPUT\_SYMLINK

By setting this environment variable to a truthy value, you instruct the Valohai agent to create sym-links pointing to the files on the remote file system.

```shellscript
VH_ALLOW_INPUT_SYMLINK=true     # true values: 1, yes, true
```

> :bulb:This behavior is **disabled** by default.

#### VH\_ALLOW\_INPUT\_COPY

Copying files is default behavior (when **VH\_ALLOW\_INPUT\_SYMLINK** is disabled) but also used as a fallback behavior in case sym-linking of a file fails (when **VH\_ALLOW\_INPUT\_SYMLINK** is enabled).

```shellscript
VH_ALLOW_INPUT_COPY=1     # true values: 1, yes, true
```

> :bulb:This behavior is **enabled** by default.

Execution will be stopped and marked as errored in case:

* **VH\_ALLOW\_INPUT\_SYMLINK = 0 and VH\_ALLOW\_INPUT\_COPY=0 -** Neither sym-linking nor copying is allowed
* **VH\_ALLOW\_INPUT\_SYMLINK=1 and VH\_ALLOW\_INPUT\_COPY=0 and \[sym-linking of a file fails]** In such case, copying would be used as a fallback, but since it's disabled, execution will be stopped.

### **Resource Limits**

#### VH\_CPU\_LIMIT

Limit CPU cores available to the job.

```bash
VH_CPU_LIMIT=2        # Use 2 cores
VH_CPU_LIMIT=0,2,4    # Use specific core indices
```

#### VH\_MEMORY\_LIMIT

Set memory usage limit.

```bash
VH_MEMORY_LIMIT=500M   # 500 megabytes
VH_MEMORY_LIMIT=2G     # 2 gigabytes
```

#### VH\_SHM\_SIZE

Increase shared memory directory size.

```bash
VH_SHM_SIZE=16G       # 16 gigabytes
```

Useful for applications that need large shared memory (e.g., PyTorch DataLoader).

### **Docker**

#### VH\_DOCKER\_NETWORK

Name of the network in which to create the execution container.

```shellscript
VH_DOCKER_NETWORK="training-net"
```

#### VH\_EXPOSE\_PORTS

Comma-separated list of port mappings (\<host-port>:\<container-port>) to expose on the execution container.

```shellscript
VH_EXPOSE_PORTS='8080:80, 22:22'
```

#### VH\_INIT

Whether to use an `init`-like daemon in the Docker container. Defaults to **true**, since most workloads don't properly handle signals. In case a workload does handle signals properly, this can be set to a falsy value.

```shellscript
VH_INIT=0     # false values: 0, no, false
```

#### VH\_ALLOW\_ENTRYPOINT

Whether to use the `ENTRYPOINT` defined in the Docker image.

```shellscript
VH_ALLOW_ENTRYPOINT=1     # true values: 1, yes, true
```

#### **VH\_NO\_IMAGE\_CACHE**

Force Docker image re-pull, ignoring local cache.

```bash
VH_NO_IMAGE_CACHE=true     # true values: 1, yes, true
```

### **Storage Behavior**

#### VH\_TMPFS

Control whether `/tmp` writes to memory or disk.

```bash
VH_TMPFS=0    # false values: 0, no, false
```

By default, `/tmp` is a memory filesystem. Setting this to false writes to disk instead, which is slower but avoids out-of-memory errors for large temporary files.

### GPU

#### VH\_RESET\_GPU

Whether to issue an Nvidia GPU reset command before running the container.

```shellscript
VH_RESET_GPU=1    # true values: 1, yes, true
```

#### VH\_XORG

Whether to start a Xorg server for the execution's duration.

```shellscript
VH_XORG=1    # true values: 1, yes, true
```

### CODE

#### VH\_CHOWN\_REPOSITORY

Whether to try to `chown` the repository directory to the user running the container.

```shellscript
VH_CHOWN_REPOSITORY=1    # true values: 1, yes, true
```
