Steps
A step is a reusable blueprint that defines a specific ML workload in your project. Think of it as a template that describes what should happen when you want to run a particular job.
When you actually run a step, you create an execution, a versioned snapshot of that step run with specific inputs and parameters.
💡 One step can generate thousands of executions. Each execution is version-controlled and reproducible.
Why steps matter
Steps give you reproducible, scalable ML workflows. Instead of manually running scripts with different parameters each time, you define the work once and run it as many times as needed.
Common step types include:
Data preprocessing and feature engineering
Model training with hyperparameter sweeps
Model validation and testing
Batch inference and predictions
Model deployment to staging/production
Step vs. Execution
Step
Execution
Blueprint/template
Versioned snapshot of the step being run
Defined in valohai.yaml
Created when you run a step
Reusable
Specific run with exact inputs/parameters
Static definition
Has logs, outputs, duration, and lineage
Anatomy of a step
Steps are defined in your project's valohai.yaml file and specify:
Docker image — the environment your code runs in
Commands — what gets executed
Inputs — data files your step needs
Parameters — configurable values (learning rate, epochs, etc.)
Environment — compute requirements
Example: Simple training pipeline
---
- step:
name: preprocess-dataset
image: python:3.9
command:
- pip install numpy valohai-utils
- python ./preprocess_dataset.py
inputs:
- name: dataset
default: https://valohaidemo.blob.core.windows.net/mnist/mnist.npz
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- pip install valohai-utils
- python ./train_model.py {parameters}
parameters:
- name: epochs
default: 5
type: integer
- name: learning_rate
default: 0.001
type: float
inputs:
- name: dataset
default: https://valohaidemo.blob.core.windows.net/mnist/preprocessed_mnist.npz
- step:
name: evaluate-model
image: tensorflow/tensorflow:2.6.0
command:
- pip install valohai-utils scikit-learn
- python ./evaluate_model.py
inputs:
- name: model
- name: test_dataPipeline examples
For production workflows, you might chain steps together:
Data Pipeline: fetch-data → clean-data → feature-engineering
Training Pipeline: train-model → validate-model → register-model
Deployment Pipeline: build-inference-service → deploy-staging → deploy-production
Each step runs independently but can use outputs from previous steps as inputs.
Next steps
Ready to create your first step? Check out:
Quickstart: Run your first job — hands-on tutorial for beginners
Pipeline quickstart — connect multiple steps together
valohai.yaml reference — complete configuration options
Last updated
Was this helpful?
