Defining Jobs
Valohai runs your machine learning code as executions. Tracked and versioned runs of your runs on remote machines. Each execution captures everything, your code version, parameters, outputs, logs, and
💡 About this tutorial: We use YOLOv8 as a practical example to demonstrate Valohai's features. You don't need computer vision knowledge—the patterns you learn here apply to any ML framework. This tutorial focuses on defining jobs (=executions) and saving their output files while ensuring proper versioning and tracking of your ML workflows.
Prerequisites
Python 3.8 or later
A Valohai account (create one free)
Install the CLI
Get the Valohai CLI and utilities for experiment tracking:
pip install valohai-cliTip: Use
pipx install valohai-clito avoid dependency conflicts.
Login
vh loginCreate Your First Project
Set up a project directory and connect it to Valohai:
mkdir my-ml-project
cd my-ml-project
vh project create --name my-ml-projectThis links your local directory to Valohai for experiment tracking.
Write Your Training Script
Save this as train.py. Half of it is the standard model training from Yolo and the other half is Valohai specific where we:
Copy the trained model files to
/valohai/outputs/direction, from Valohai will version and upload themThe yolo training script will develop several files, including a file called
best.onnxWe'll want this create an Valohai alias called
latest-modelthat we can reference, and it'll always point to the newest file generated by this job. (that we'll use for inferencing later)Generate a JSON file called
best.onnx.metadata.jsonwhere we put in some JSON, to define that we want this the aliaslatest-modelto point to this newly generated file. Read more about aliases in our alias docs.
import shutil
from ultralytics import YOLO
import json
# Load a pretrained model (recommended for training)
model = YOLO("yolov8n.pt")
# Train the model
model.train(data="coco128.yaml", epochs=1, verbose=False)
# Export the model to ONNX format
path = model.export(format="onnx")
# Valohai parts start here
# Copy the exported model to the Valohai outputs directory
shutil.copy(path, '/valohai/outputs/')
# Define a JSON dictionary containing a friendly name
# You can then reference this file with datum://latest-model
file_metadata = {
"valohai.alias": "latest-model"
}
# Attach the metadata to the file
with open("/valohai/outputs/best.onnx.metadata.json", "w") as f:
f.write(file_metadata)Configure Your Execution Environment
Create valohai.yaml to define how your code runs:
Update the environment field with the GPU machine you want to run the job on. You can see a list of available GPU machines by running vh environments --gpu. Use the slug name provided in the output.
- step:
name: yolo
image: docker.io/ultralytics/ultralytics:8.0.180-python
command: python train.py
environment: aws-eu-west-1-p3-2xlargeAbout Docker Images
The image field specifies your execution environment. Think of it as a clean environment with only the software you specify. For example:
Python:
python:3.9,python:3.11TensorFlow:
tensorflow/tensorflow:2.13.0PyTorch:
pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtimeR:
r-base:4.3.0Custom: Your own Docker images with specific dependencies
Note: The Docker image can different from your local Python version. Valohai runs your code in this isolated environment for reproducibility.
Run Your First Execution
Submit the job and watch logs in real-time:
vh execution run yolo --adhoc --watchWhat happens:
--adhocuploads your current code without pushing to Git (great for testing)--watchstreams logs to your terminalValohai tracks all inputs, outputs, and parameters automatically
View Results in the UI
Open your execution in the browser:
vh execution openThe UI shows:
Real-time logs and metrics charts
Parameter values and configuration
Output files (downloadable)
Full reproducibility information
Troubleshooting
Last updated
Was this helpful?
