Working with Data

Valohai inputs let you feed data into your executions from multiple sources such as URLs, your data catalog, or outputs from other jobs. Here's how to configure and use them effectively.

💡 About this tutorial: We use YOLOv8 as a practical example to demonstrate Valohai's features. You don't need computer vision knowledge—the patterns you learn here apply to any ML framework. This tutorial focuses on ingesting data, running inference, and saving resulting files while ensuring proper versioning and tracking of your ML workflows.

Define Inputs in valohai.yaml

Add input groups to any step in your valohai.yaml:

- step:
    name: inference
    image: docker.io/ultralytics/ultralytics:8.0.180-python
    command: python inference.py
    inputs:
        - name: model
          default: datum://latest-model  # References data from your catalog
          filename: best.onnx            # Renames the file when downloaded
        - name: images
          default: https://ultralytics.com/images/bus.jpg

Input Configuration Options

  • name: Identifier for accessing files at /valohai/inputs/{name}/

  • default: Pre-filled source (URL, S3 path, or datum:// reference)

  • filename: Rename downloaded files for consistent access

💡 Use datum:// to reference any file in your project's data catalog, including outputs from previous executions.

Access Inputs in Your Code

All inputs download to /valohai/inputs/{input-name}/ before your code runs:

from ultralytics import YOLO
import os

# Inputs are always at predictable paths
path_to_model = "/valohai/inputs/model/best.onnx"
path_to_images = "/valohai/inputs/images/"

model = YOLO(path_to_model)

# Process all files in the images input
for image in os.listdir(path_to_images):
    image_path = os.path.join(path_to_images, image)
    
    if os.path.isfile(image_path):
        results = model.predict(image_path, save=True, project="/valohai/outputs", name="predictions")
        
        for r in results:
            print(r.boxes)

Run with Inputs

Execute your step and Valohai handles the downloads:

vh execution run inference --adhoc --open-browser

On the execution page, the Inputs section shows:

  • Source locations (URLs, S3 paths, or datum links)

  • File previews for supported formats

  • Download status and file sizes

Override Inputs at Runtime

From the UI

  1. Copy any execution using the "Copy" button

  2. Modify inputs in the form:

    • Add URLs directly

    • Browse and select from your data catalog

    • Reference outputs from other executions

  3. Create the new execution

From the CLI

Override default inputs without modifying valohai.yaml:

# Single file
vh execution run inference --adhoc --model=s3://my-bucket/model.onnx

# Multiple files for one input
vh execution run inference --adhoc --images=https://example.com/img1.jpg --images=https://example.com/img2.jpg

Input Sources

Valohai accepts inputs from:

  • Direct URLs: Any publicly accessible HTTP/HTTPS endpoint

  • Cloud storage: S3, Azure Blob, GCS (with configured credentials)

  • Data catalog: Files uploaded to your project or outputs from executions

  • Version control: Files committed to your repository

Connecting Executions

Use outputs from one job as inputs to another:

  1. Train a model → saves to /valohai/outputs/

  2. Reference it with datum:// in your next step

  3. Valohai tracks the lineage automatically

This creates traceable pipelines where you can see exactly which model version produced which predictions.

💬 Best Practices

  • Use descriptive input names that indicate the expected file type

  • Set sensible defaults in valohai.yaml for common use cases

  • Override inputs at runtime for experimentation

  • Check file existence before processing to handle optional inputs gracefully

Last updated

Was this helpful?