# Pipelines

Valohai pipelines transform complex ML workflows into modular, reusable components. Instead of running monolithic scripts, break your work into steps that can be versioned, reused, and optimized independently.

### Why use pipelines?

#### Automatic checkpointing between steps

Each pipeline step runs as a separate execution, creating natural checkpoints. When something fails, you don't lose hours of computation, just restart from the last successful step.

#### Efficient resource allocation

Different steps need different resources. Your data preprocessing might need high CPU and memory, while training needs GPUs. Pipelines let you specify exact requirements per step, freeing up expensive resources when they're not needed.

#### Reuse previous work

Made a code change to step 4 of a 6-step pipeline? Use the "reuse nodes" capability to skip steps 1-3 and start directly from your fix. No more waiting for preprocessing to finish again.

#### Built for experimentation and production

Pipelines aren't just for production workflows. During experimentation:

* Benchmark multiple models against different datasets in parallel
* Add conditional logic to explore different paths based on results
* Pause for human approval before expensive training steps
* Run hyperparameter tuning as a pipeline after data processing has been completed

### Core concepts

#### Nodes

Individual jobs within your pipeline:

* **Executions**: Standard Valohai executions running your code
* **Tasks**: Collections of executions with the same code but different parameters/data (perfect for hyperparameter tuning or benchmarking models/datasets)
* **Deployments**: Create new model endpoints as part of your workflow

#### Edges

Connections that pass data between nodes:

* **Output → Input**: Files produced by one node become inputs for the next
* **Input → Input**: Share the same input files across multiple nodes
* **Parameters → Parameters**: Pass parameter values between nodes
* **Metadata → Parameters**: Use runtime-generated values (like optimal hyperparameters) as parameters to downstream nodes

### When to use pipelines

Pipelines excel when you have:

* Multi-step workflows where each step has different resource requirements
* Long-running processes where failure recovery matters
* Workflows you'll run repeatedly (with small variations)
* Complex dependencies between different processing stages
* Need for conditional execution or human approval steps

### Common patterns

#### Training pipeline

1. **Preprocess**: Clean and transform raw data (CPU-intensive)
2. **Train**: Train your model (GPU-intensive)
3. **Evaluate**: Test model performance
4. **Deploy**: Create endpoint if metrics pass threshold

#### Experimentation pipeline

1. **Prepare datasets**: Create train/validation/test splits
2. **Hyperparameter search**: Run parallel training jobs with different parameters
3. **Compare results**: Analyze performance across experiments
4. **Select best model**: Automatically identify top performer

#### Production pipeline (scheduled)

1. **Fetch new data**: Pull latest data from your warehouse
2. **Validate quality**: Check data integrity and distributions
3. **Retrain model**: Update model with new data
4. **A/B test**: Deploy to staging for comparison
5. **Promote**: Move to production after approval


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/pipelines.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
