Working with Data

Valohai inputs let you feed data into your executions from multiple sources such as URLs, your data catalog, or outputs from other jobs. Here's how to configure and use them effectively.

💡 About this tutorial: We use YOLOv8 as a practical example to demonstrate Valohai's features. You don't need computer vision knowledge—the patterns you learn here apply to any ML framework. This tutorial focuses on ingesting data, running inference, and saving resulting files while ensuring proper versioning and tracking of your ML workflows.

Define Inputs in valohai.yaml

Add input groups to any step in your valohai.yaml:

- step:
    name: inference
    image: docker.io/ultralytics/ultralytics:8.0.180-python
    command: python inference.py
    inputs:
        - name: model
          default: datum://latest-model  # References data from your catalog
          filename: best.onnx            # Renames the file when downloaded
        - name: images
          default: https://ultralytics.com/images/bus.jpg

Input Configuration Options

name: Identifier for accessing files at /valohai/inputs/{name}/
default: Pre-filled source (URL, S3 path, or datum:// reference)
filename: Rename downloaded files for consistent access

💡 Use datum:// to reference any file in your project's data catalog, including outputs from previous executions.

Access Inputs in Your Code

All inputs download to /valohai/inputs/{input-name}/ before your code runs:

from ultralytics import YOLO
import os

# Inputs are always at predictable paths
path_to_model = "/valohai/inputs/model/best.onnx"
path_to_images = "/valohai/inputs/images/"

model = YOLO(path_to_model)

# Process all files in the images input
for image in os.listdir(path_to_images):
    image_path = os.path.join(path_to_images, image)
    
    if os.path.isfile(image_path):
        results = model.predict(image_path, save=True, project="/valohai/outputs", name="predictions")
        
        for r in results:
            print(r.boxes)

Run with Inputs

Execute your step and Valohai handles the downloads:

vh execution run inference --adhoc --open-browser

On the execution page, the Inputs section shows:

Source locations (URLs, S3 paths, or datum links)
File previews for supported formats
Download status and file sizes

Override Inputs at Runtime

From the UI

Copy any execution using the "Copy" button
Modify inputs in the form:
- Add URLs directly
- Browse and select from your data catalog
- Reference outputs from other executions
Create the new execution

From the CLI

Override default inputs without modifying valohai.yaml:

# Single file
vh execution run inference --adhoc --model=s3://my-bucket/model.onnx

# Multiple files for one input
vh execution run inference --adhoc --images=https://example.com/img1.jpg --images=https://example.com/img2.jpg

Input Sources

Valohai accepts inputs from:

Direct URLs: Any publicly accessible HTTP/HTTPS endpoint
Cloud storage: S3, Azure Blob, GCS (with configured credentials)
Data catalog: Files uploaded to your project or outputs from executions
Version control: Files committed to your repository

Connecting Executions

Use outputs from one job as inputs to another:

Train a model → saves to /valohai/outputs/
Reference it with datum:// in your next step
Valohai tracks the lineage automatically

This creates traceable pipelines where you can see exactly which model version produced which predictions.

💬 Best Practices

Use descriptive input names that indicate the expected file type
Set sensible defaults in valohai.yaml for common use cases
Override inputs at runtime for experimentation
Check file existence before processing to handle optional inputs gracefully

PreviousLogging Metrics NextBuilding Pipelines

Last updated 1 month ago

Was this helpful?