Generate YAML with valohai-utils

Define Valohai steps in Python and generate valohai.yaml automatically

If you prefer defining ML workflows in Python instead of writing YAML by hand, the Python helper tool valohai-utils lets you generate valohai.yaml from your code.

This is optional. Many users write YAML directly to keep their code free of Valohai dependencies.


Why Generate YAML from Python?

Familiar syntax: If you're more comfortable with Python than YAML, this approach feels more natural.

Type safety: Python editors provide autocomplete and type checking, catching errors before execution.

Programmatic generation: Build YAML dynamically based on conditions, loops, or external configs.


How It Works

Install valohai-utils:

pip install valohai-utils

Define a step in your Python script:

# train.py
import valohai

# Define parameters
params = {
    "epochs": 10,
    "learning_rate": 0.001,
}
 
# Define inputs
inputs = {
    "dataset": "s3://my-bucket/train.csv"
}
 
valohai.prepare(step="train", image="python:3.12", default_parameters=params, default_inputs=inputs)

# Your training code
print(f"Training with lr={lr} for {epochs} epochs")
print(f"Dataset: {dataset}")

Generate the YAML:

vh yaml step train.py

This creates the following valohai.yaml file:

- step:
    name: train
    image: python:3.12
    command: python train.py {parameters}
    parameters:
      - name: learning_rate
        default: 0.001
        type: float
      - name: epochs
        default: 10
        type: integer
    inputs:
      - name: dataset
        default: s3://my-bucket/train.csv

When to Use This Approach

You're Python-first: Your team is more comfortable with Python than YAML syntax.

Dynamic workflows: You need to generate steps programmatically based on runtime conditions.

Rapid prototyping: You want to define and test steps quickly without switching between files.


When NOT to Use This Approach

Keep code clean: If you want your ML code to remain framework-agnostic, write YAML by hand.

Team collaboration: Non-Python users may find YAML easier to read and edit.

Complex pipelines: Large multi-step pipelines are often clearer in YAML than generated from Python.


What's Next?

Last updated

Was this helpful?