Defining Jobs

💡 About this tutorial: We use YOLOv8 as a practical example to demonstrate Valohai's features. You don't need computer vision knowledge—the patterns you learn here apply to any ML framework. This tutorial focuses on defining jobs (=executions) and saving their output files while ensuring proper versioning and tracking of your ML workflows.

Prerequisites

Install the CLI

Get the Valohai CLI and utilities for experiment tracking:

pip install valohai-cli

Tip: Use pipx install valohai-cli to avoid dependency conflicts.

Login

vh login
Using SSO or self-hosted Valohai?

SSO Users (Azure, Google, SAML)

Generate a personal access token:

  1. Go to Hi, [username]My ProfileAuthentication

  2. Click Manage TokensCreate a new personal token

  3. Save the token securely (shown only once)

vh login --token <your-token>

Self-Hosted Installations

vh login --host https://your-company.valohai.io

💡 Combine options: vh login --host https://your-company.valohai.io --token <token>

Create Your First Project

Set up a project directory and connect it to Valohai:

This links your local directory to Valohai for experiment tracking.

Write Your Training Script

Save this as train.py. Half of it is the standard model training from Yolo and the other half is Valohai specific where we:

  • Copy the trained model files to /valohai/outputs/ direction, from Valohai will version and upload them

  • The yolo training script will develop several files, including a file called best.onnx

  • We'll want this create an Valohai alias called latest-model that we can reference, and it'll always point to the newest file generated by this job. (that we'll use for inferencing later)

  • Generate a JSON file called best.onnx.metadata.json where we put in some JSON, to define that we want this the alias latest-model to point to this newly generated file. Read more about aliases in our alias docs.

Configure Your Execution Environment

Create valohai.yaml to define how your code runs:

Update the environment field with the GPU machine you want to run the job on. You can see a list of available GPU machines by running vh environments --gpu. Use the slug name provided in the output.

About Docker Images

The image field specifies your execution environment. Think of it as a clean environment with only the software you specify. For example:

  • Python: python:3.9, python:3.11

  • TensorFlow: tensorflow/tensorflow:2.13.0

  • PyTorch: pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime

  • R: r-base:4.3.0

  • Custom: Your own Docker images with specific dependencies

Note: The Docker image can different from your local Python version. Valohai runs your code in this isolated environment for reproducibility.

Run Your First Execution

Submit the job and watch logs in real-time:

What happens:

  • --adhoc uploads your current code without pushing to Git (great for testing)

  • --watch streams logs to your terminal

  • Valohai tracks all inputs, outputs, and parameters automatically

View Results in the UI

Open your execution in the browser:

The UI shows:

  • Real-time logs and metrics charts

  • Parameter values and configuration

  • Output files (downloadable)

  • Full reproducibility information


Troubleshooting

Command 'vh' not found

The CLI isn't in your system PATH. Fix it:

macOS/Linux:

Windows: Add the Python Scripts folder to your PATH environment variable.

Alternative: Run with python -m valohai_cli instead of vh.

Project not linked

Run vh project link to connect your directory to an existing Valohai project.

Import error for valohai module

Install the Valohai utilities as a part of your step in valohai.yaml:

Last updated

Was this helpful?