Pipelines
Valohai pipelines transform complex ML workflows into modular, reusable components. Instead of running monolithic scripts, break your work into steps that can be versioned, reused, and optimized independently.
Why use pipelines?
Automatic checkpointing between steps
Each pipeline step runs as a separate execution, creating natural checkpoints. When something fails, you don't lose hours of computation, just restart from the last successful step.
Efficient resource allocation
Different steps need different resources. Your data preprocessing might need high CPU and memory, while training needs GPUs. Pipelines let you specify exact requirements per step, freeing up expensive resources when they're not needed.
Reuse previous work
Made a code change to step 4 of a 6-step pipeline? Use the "reuse nodes" capability to skip steps 1-3 and start directly from your fix. No more waiting for preprocessing to finish again.
Built for experimentation and production
Pipelines aren't just for production workflows. During experimentation:
Benchmark multiple models against different datasets in parallel
Add conditional logic to explore different paths based on results
Pause for human approval before expensive training steps
Run hyperparameter tuning as a pipeline after data processing has been completed
Core concepts
Nodes
Individual jobs within your pipeline:
Executions: Standard Valohai executions running your code
Tasks: Collections of executions with the same code but different parameters/data (perfect for hyperparameter tuning or benchmarking models/datasets)
Deployments: Create new model endpoints as part of your workflow
Edges
Connections that pass data between nodes:
Output → Input: Files produced by one node become inputs for the next
Input → Input: Share the same input files across multiple nodes
Parameters → Parameters: Pass parameter values between nodes
Metadata → Parameters: Use runtime-generated values (like optimal hyperparameters) as parameters to downstream nodes
When to use pipelines
Pipelines excel when you have:
Multi-step workflows where each step has different resource requirements
Long-running processes where failure recovery matters
Workflows you'll run repeatedly (with small variations)
Complex dependencies between different processing stages
Need for conditional execution or human approval steps
Common patterns
Training pipeline
Preprocess: Clean and transform raw data (CPU-intensive)
Train: Train your model (GPU-intensive)
Evaluate: Test model performance
Deploy: Create endpoint if metrics pass threshold
Experimentation pipeline
Prepare datasets: Create train/validation/test splits
Hyperparameter search: Run parallel training jobs with different parameters
Compare results: Analyze performance across experiments
Select best model: Automatically identify top performer
Production pipeline (scheduled)
Fetch new data: Pull latest data from your warehouse
Validate quality: Check data integrity and distributions
Retrain model: Update model with new data
A/B test: Deploy to staging for comparison
Promote: Move to production after approval
Last updated
Was this helpful?
