# Execution Reuse and Caching

Skip redundant computations by reusing results from previous executions. When Valohai detects an identical step configuration, it uses cached results instead of running the step again.

### How execution reuse saves time

Consider this scenario: You're iterating on a model architecture, but your 3-hour data preprocessing step hasn't changed. With execution reuse:

1. **First run**: All steps execute normally
2. **Second run** (after model code changes): Preprocessing is skipped, saving 3 hours
3. **Result**: Iterate on model development 5x faster

### When executions are reused

Valohai reuses an execution when ALL of these match exactly:

* **Source code**: Same Git commit or file contents
* **Parameters**: Identical parameter values
* **Input data**: Same files (verified by checksums)
* **Docker image**: Same container environment
* **Step name**: Same step definition

If any element differs, the step runs fresh to ensure reproducibility.

### Enable execution reuse

#### Method 1: Pipeline-wide in valohai.yaml

Enable for all runs of a pipeline:

```yaml
- pipeline:
    name: model-training
    reuse-executions: true  # Enable caching
    nodes:
      - name: preprocess
        type: execution
        step: preprocess-dataset
      - name: train
        type: execution
        step: train-model
      - name: evaluate
        type: execution
        step: evaluate-model
    edges:
      - [preprocess.output.*, train.input.dataset]
      - [train.output.model, evaluate.input.model]
```

#### Method 2: Per-run in the web interface

Toggle reuse for individual pipeline runs:

<figure><img src="/files/usOw29rc8WBVtI9i3Shd" alt=""><figcaption></figcaption></figure>

> 💡 *Use the web interface to temporarily disable reuse when you need fresh results despite unchanged inputs.*

### Practical examples

#### Data science iteration workflow

```yaml
- pipeline:
    name: experiment-pipeline
    reuse-executions: true
    nodes:
      # This rarely changes - perfect for reuse
      - name: fetch-and-clean
        type: execution
        step: data-preparation

      # This might change - but reuse when it doesn't
      - name: feature-engineering
        type: execution
        step: create-features

      # This changes frequently - but benefits from upstream reuse
      - name: train-experiment
        type: execution
        step: train-model
```

**Reuse pattern**:

* Data preparation: Reused 95% of the time
* Feature engineering: Reused 70% of the time
* Model training: Runs fresh but starts immediately with cached inputs

### Understanding cache behavior

#### What triggers a fresh run?

Any change to:

```yaml
# Parameters
parameters:
  - name: batch_size
    default: 32  # Changing to 64 = fresh run

# Inputs  
inputs:
  - name: dataset
    default: s3://bucket/v1/*.csv  # New files = fresh run

# Code
command: python train.py  # Different commit = fresh run

# Environment
environment: aws-p3-2xlarge  # Different instance = fresh run
```

### Best practices

#### 1. Structure pipelines for maximum reuse

```yaml
# Good: Separate volatile and stable steps
nodes:
  - name: stable-preprocessing  # Changes monthly
  - name: volatile-training     # Changes daily
```

```yaml
# Bad: Combining volatile and stable logic
nodes:
  - name: preprocess-and-train  # Any change reruns everything
```

#### 2. Use deterministic operations

```python
# Good: Deterministic preprocessing
def preprocess(data):
    return data.sort_values("id").reset_index(drop=True)


# Bad: Non-deterministic operations
def preprocess(data):
    return data.sample(frac=0.8)  # Random sampling = no reuse
```

#### 3. Version your data explicitly

```yaml
inputs:
  - name: dataset
    # Good: Versioned data
    default: s3://bucket/data/v2.1/train.parquet

    # Bad: Mutable references
    # default: s3://bucket/data/latest/train.parquet
```

#### 4. Monitor reuse effectiveness

In the pipeline view, reused executions show a special indicator. Track reuse rates to optimize pipeline structure.

### Manual execution reuse

Besides automatic reuse, you can manually select specific past executions to use as pipeline nodes. This is useful when:

* You have a perfect execution from last week you want to reuse
* You're building a pipeline incrementally, testing one node at a time
* You want to skip expensive steps during development

#### Reuse via web interface

1. Click on the Reuse nodes button
2. Select from the **Pipeline from which to reuse**
3. Click the checkboxes on what nodes you want to reuse
4. The node will use that execution's outputs without running again

<figure><img src="/files/ad9pFzxJjiR86vmSH5GM" alt=""><figcaption></figcaption></figure>

#### Reuse via API

For programmatic pipeline creation, use `reuse_execution_id` instead of a template:

```python
import requests
import os

pipeline_config = {
    "project": "PROJECT_ID",
    "title": "experiment-with-reuse",
    "nodes": [
        {
            "name": "preprocess",
            "type": "execution",
            "reuse_execution_id": "exec_123456",  # Reuse past execution
        },
        {
            "name": "train",
            "type": "execution",
            "template": {  # Run fresh
                "step": "train-model",
                "environment": "aws-p3-2xlarge",
                "commit": "main",
            },
        },
    ],
    "edges": [
        ["preprocess.output.*", "train.input.dataset"],
    ],
}

response = requests.post(
    "https://app.valohai.com/api/v0/pipelines/",
    json=pipeline_config,
    headers={
        "Authorization": f"Token {os.getenv('VH_TOKEN')}",
        "Content-Type": "application/json",
    },
)
```

#### Manual vs automatic reuse

| Aspect          | Automatic Reuse                          | Manual Reuse                                  |
| --------------- | ---------------------------------------- | --------------------------------------------- |
| **When to use** | Iterative development with small changes | Building pipelines with known good executions |
| **Selection**   | System finds matching execution          | You choose specific execution                 |
| **Flexibility** | Based on exact parameter/input match     | Use any compatible execution                  |
| **Use case**    | "Run this again if nothing changed"      | "Use that great run from Tuesday"             |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/pipelines/reuse-nodes.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
