Pipelines: Chain Your Jobs
Connect your existing jobs into automated workflows. Define how data flows between steps and let Valohai handle the orchestration.
💡 Already have working steps? You're ready to build pipelines. Just define how outputs connect to inputs.
How Pipelines Work
A pipeline is a recipe for connecting jobs:
Nodes = Your jobs (preprocessing, training, evaluation, etc.)
Edges = Data / information flow (e.g. which outputs become which inputs)
When you run a pipeline, Valohai automatically:
Executes jobs in the right order
Passes outputs between steps into defined inputs
Handles parallel execution where possible
Tracks the complete lineage
Quick Example
Connect three existing steps into a pipeline:
- pipeline:
name: ml-workflow
nodes:
- name: preprocess
type: execution
step: preprocess-data
- name: train
type: execution
step: train-model
- name: evaluate
type: execution
step: evaluate-model
edges:
# Connect outputs → inputs
- [preprocess.outputs.*, train.inputs.dataset]
- [train.outputs.model*, evaluate.inputs.model]
- [preprocess.outputs.*test*, evaluate.inputs.test-data]Run it:
💡 If you have pushed the valohai.yaml to Git and fetched the commit to your Valohai project, you can omit the
--adhocflag.
Complete Example
Let's build a real pipeline with three steps:
1. Define Your Steps (if not already done)
2. Connect as Pipeline
Edge Patterns
Basic Output → Input
Wildcard Matching
Pass parameters and metrics between nodes
In addition to defining the edges via outputs and inputs, they can be also used to pass parameters between nodes.
Multiple Targets
Advanced Features
Conditional Execution
You can define specific conditions for pipeline nodes.
When: Actions trigger when certain events occur during pipeline execution. The available options include:
node-starting: When a node is about to start.node-complete: When a node successfully completes.node-error: When a node encounters an error.
If Condition: The condition to trigger the action can be based on either metric or a parameter value.
Then: Depending on the condition being met, you can take one of the following actions:
stop-pipeline: Halts the entire pipeline.require-approval: Pauses the pipeline until a user manually approves the previous results.
Parallel Execution
In the example below the nodes train-model-a and train-model-b will run in parallel. The ensemble node will only start once both of them are finished.
It is also possible to run Task nodes inside pipelines:
Deployments
In addition to execution and Task nodes, it is possible to create deployments from pipelines.
It is possible to create pipeline nodes after a deployment node. This can be used to for example check the endpoint once it has been created or clean old endpoints within the pipeline.
Running Pipelines
From CLI
Quick Reference
Minimal Pipeline
Edge Syntax
Sources:
node-name.output.*— All outputsnode-name.output.*.csv— Only CSV filesnode-name.output.name*— Starts with "name"node-name.metadata.accuracy— Metadata valuenode-name.parameter.learning_rate— Parameter valuedeploy.deployment.id/deploy.deployment.version_id— Deployment / deployment version id
Targets:
node-name.input.input-name— Any input available on the nodenode-name.parameter.learning_rate— Parameter valuedeploy.file.predict-digit.model— File for deployment nodes
Node Types
execution— Run a steptask— Run parameter sweepdeployment— Create endpoint
Bottom line: If your steps work individually, connecting them into a pipeline takes just a few lines of YAML.
Last updated
Was this helpful?
