Executions

An execution is how Valohai runs your machine learning code on remote infrastructure. If you've used other ML platforms, think of executions as "jobs" or "experiments" but more focused on being composable building blocks within your larger workflow.

What is an execution?

An execution runs one or more commands on a remote server with a specific configuration. Each execution is tied to a step (defined in your valohai.yaml), but you can run the same step multiple times with different:

Parameters
Input files
Hardware (GPU/CPU types)
Docker images

💡 Every training run, data preprocessing task, or evaluation script becomes an execution in Valohai.

Three components shape your execution

1. Environment

The compute infrastructure where your code runs.

Machine type: GPU instances for training, high-memory instances for data processing
Cloud provider or on-premise: AWS, Azure, GCP, Kubernetes, or on-premise
Example: Train neural networks on p3.8xlarge (4x V100 GPUs), run feature extraction on m5.24xlarge (384GB RAM)

2. Docker image

The software environment containing your dependencies. You can use:

Pre-built images for TensorFlow, PyTorch, scikit-learn
Custom images with your specific library versions
Use images from Docker Hub, AWS ECR, or private registries

3. Repository code

How it works:

Valohai clones your Git repository at a specific commit
Code is available at /valohai/repository (your working directory)
Same commit = same code = full reproducibility

Running local code with `--adhoc`:

During development, you often want to test changes without committing to Git. Use the --adhoc flag to run your local code directly:

vh execution run --adhoc

This packages your local changes, uploads them to your data store, and downloads them on the worker for the execution. Everything stays fully reproducible, Valohai tracks the exact code snapshot used.

Creating executions

You have three ways to launch executions:

1. Define steps in valohai.yaml

- step:
    name: train-model
    image: tensorflow/tensorflow:2.13.0-gpu
    command: python train.py --epochs {parameter:epochs}

2. Use 🐍 valohai-utils (Python)

import valohai

# Auto-generates the YAML configuration
valohai.prepare(step="train-model", image="tensorflow/tensorflow:2.13.0-gpu")

3. Launch via:

Web UI: Point-and-click parameter selection
CLI: vh execution run train-model --adhoc
API: Programmatic execution management

PreviousManaging Large YAML Files NextSteps

Last updated 28 days ago

Was this helpful?