Execution Reuse and Caching

Skip redundant computations by reusing results from previous executions. When Valohai detects an identical step configuration, it uses cached results instead of running the step again.

How execution reuse saves time

Consider this scenario: You're iterating on a model architecture, but your 3-hour data preprocessing step hasn't changed. With execution reuse:

  1. First run: All steps execute normally

  2. Second run (after model code changes): Preprocessing is skipped, saving 3 hours

  3. Result: Iterate on model development 5x faster

When executions are reused

Valohai reuses an execution when ALL of these match exactly:

  • Source code: Same Git commit or file contents

  • Parameters: Identical parameter values

  • Input data: Same files (verified by checksums)

  • Docker image: Same container environment

  • Step name: Same step definition

If any element differs, the step runs fresh to ensure reproducibility.

Enable execution reuse

Method 1: Pipeline-wide in valohai.yaml

Enable for all runs of a pipeline:

Method 2: Per-run in the web interface

Toggle reuse for individual pipeline runs:

💡 Use the web interface to temporarily disable reuse when you need fresh results despite unchanged inputs.

Practical examples

Data science iteration workflow

Reuse pattern:

  • Data preparation: Reused 95% of the time

  • Feature engineering: Reused 70% of the time

  • Model training: Runs fresh but starts immediately with cached inputs

Understanding cache behavior

What triggers a fresh run?

Any change to:

Best practices

1. Structure pipelines for maximum reuse

2. Use deterministic operations

3. Version your data explicitly

4. Monitor reuse effectiveness

In the pipeline view, reused executions show a special indicator. Track reuse rates to optimize pipeline structure.

Manual execution reuse

Besides automatic reuse, you can manually select specific past executions to use as pipeline nodes. This is useful when:

  • You have a perfect execution from last week you want to reuse

  • You're building a pipeline incrementally, testing one node at a time

  • You want to skip expensive steps during development

Reuse via web interface

  1. Click on the Reuse nodes button

  2. Select from the Pipeline from which to reuse

  3. Click the checkboxes on what nodes you want to reuse

  4. The node will use that execution's outputs without running again

Reuse via API

For programmatic pipeline creation, use reuse_execution_id instead of a template:

Manual vs automatic reuse

Aspect
Automatic Reuse
Manual Reuse

When to use

Iterative development with small changes

Building pipelines with known good executions

Selection

System finds matching execution

You choose specific execution

Flexibility

Based on exact parameter/input match

Use any compatible execution

Use case

"Run this again if nothing changed"

"Use that great run from Tuesday"

Last updated

Was this helpful?