Working with Data
Valohai inputs let you feed data into your executions from multiple sources such as URLs, your data catalog, or outputs from other jobs. Here's how to configure and use them effectively.
💡 About this tutorial: We use YOLOv8 as a practical example to demonstrate Valohai's features. You don't need computer vision knowledge—the patterns you learn here apply to any ML framework. This tutorial focuses on ingesting data, running inference, and saving resulting files while ensuring proper versioning and tracking of your ML workflows.
Define Inputs in valohai.yaml
Add input groups to any step in your valohai.yaml:
- step:
name: inference
image: docker.io/ultralytics/ultralytics:8.0.180-python
command: python inference.py
inputs:
- name: model
default: datum://latest-model # References data from your catalog
filename: best.onnx # Renames the file when downloaded
- name: images
default: https://ultralytics.com/images/bus.jpgInput Configuration Options
name: Identifier for accessing files at
/valohai/inputs/{name}/default: Pre-filled source (URL, S3 path, or
datum://reference)filename: Rename downloaded files for consistent access
💡 Use
datum://to reference any file in your project's data catalog, including outputs from previous executions.
Access Inputs in Your Code
All inputs download to /valohai/inputs/{input-name}/ before your code runs:
from ultralytics import YOLO
import os
# Inputs are always at predictable paths
path_to_model = "/valohai/inputs/model/best.onnx"
path_to_images = "/valohai/inputs/images/"
model = YOLO(path_to_model)
# Process all files in the images input
for image in os.listdir(path_to_images):
image_path = os.path.join(path_to_images, image)
if os.path.isfile(image_path):
results = model.predict(image_path, save=True, project="/valohai/outputs", name="predictions")
for r in results:
print(r.boxes)Run with Inputs
Execute your step and Valohai handles the downloads:
vh execution run inference --adhoc --open-browserOn the execution page, the Inputs section shows:
Source locations (URLs, S3 paths, or datum links)
File previews for supported formats
Download status and file sizes
Override Inputs at Runtime
From the UI
Copy any execution using the "Copy" button
Modify inputs in the form:
Add URLs directly
Browse and select from your data catalog
Reference outputs from other executions
Create the new execution
From the CLI
Override default inputs without modifying valohai.yaml:
# Single file
vh execution run inference --adhoc --model=s3://my-bucket/model.onnx
# Multiple files for one input
vh execution run inference --adhoc --images=https://example.com/img1.jpg --images=https://example.com/img2.jpgInput Sources
Valohai accepts inputs from:
Direct URLs: Any publicly accessible HTTP/HTTPS endpoint
Cloud storage: S3, Azure Blob, GCS (with configured credentials)
Data catalog: Files uploaded to your project or outputs from executions
Version control: Files committed to your repository
Connecting Executions
Use outputs from one job as inputs to another:
Train a model → saves to
/valohai/outputs/Reference it with
datum://in your next stepValohai tracks the lineage automatically
This creates traceable pipelines where you can see exactly which model version produced which predictions.
💬 Best Practices
Use descriptive input names that indicate the expected file type
Set sensible defaults in
valohai.yamlfor common use casesOverride inputs at runtime for experimentation
Check file existence before processing to handle optional inputs gracefully
Last updated
Was this helpful?
