Execution Reuse and Caching
Skip redundant computations by reusing results from previous executions. When Valohai detects an identical step configuration, it uses cached results instead of running the step again.
How execution reuse saves time
Consider this scenario: You're iterating on a model architecture, but your 3-hour data preprocessing step hasn't changed. With execution reuse:
First run: All steps execute normally
Second run (after model code changes): Preprocessing is skipped, saving 3 hours
Result: Iterate on model development 5x faster
When executions are reused
Valohai reuses an execution when ALL of these match exactly:
Source code: Same Git commit or file contents
Parameters: Identical parameter values
Input data: Same files (verified by checksums)
Docker image: Same container environment
Step name: Same step definition
If any element differs, the step runs fresh to ensure reproducibility.
Enable execution reuse
Method 1: Pipeline-wide in valohai.yaml
Enable for all runs of a pipeline:
Method 2: Per-run in the web interface
Toggle reuse for individual pipeline runs:

💡 Use the web interface to temporarily disable reuse when you need fresh results despite unchanged inputs.
Practical examples
Data science iteration workflow
Reuse pattern:
Data preparation: Reused 95% of the time
Feature engineering: Reused 70% of the time
Model training: Runs fresh but starts immediately with cached inputs
Understanding cache behavior
What triggers a fresh run?
Any change to:
Best practices
1. Structure pipelines for maximum reuse
2. Use deterministic operations
3. Version your data explicitly
4. Monitor reuse effectiveness
In the pipeline view, reused executions show a special indicator. Track reuse rates to optimize pipeline structure.
Manual execution reuse
Besides automatic reuse, you can manually select specific past executions to use as pipeline nodes. This is useful when:
You have a perfect execution from last week you want to reuse
You're building a pipeline incrementally, testing one node at a time
You want to skip expensive steps during development
Reuse via web interface
Click on the Reuse nodes button
Select from the Pipeline from which to reuse
Click the checkboxes on what nodes you want to reuse
The node will use that execution's outputs without running again

Reuse via API
For programmatic pipeline creation, use reuse_execution_id instead of a template:
Manual vs automatic reuse
When to use
Iterative development with small changes
Building pipelines with known good executions
Selection
System finds matching execution
You choose specific execution
Flexibility
Based on exact parameter/input match
Use any compatible execution
Use case
"Run this again if nothing changed"
"Use that great run from Tuesday"
Last updated
Was this helpful?
