# Debug Pipeline Failures

When pipelines fail, quickly identify whether the issue is at the node level (execution failure) or pipeline level (configuration error). This guide covers both types of failures and debugging strategies.

### Understanding pipeline logs

Pipelines have two types of logs:

#### Node logs

Individual execution logs for each step:

1. Click on any node in the pipeline graph
2. View the execution details and logs
3. Check **Logs** tab for error messages

#### Pipeline logs

System-level logs for the pipeline orchestration:

1. Navigate to the pipeline view
2. Click the **Logs** tab
3. Look for configuration or dependency errors

<figure><img src="https://4109720758-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ff3mjTRQNkASbnMbJqzJ2%2Fuploads%2Fgit-blob-1081f3e0ec581fb81c50c368c1fa87cdce56a173%2Fimage.png?alt=media" alt=""><figcaption></figcaption></figure>

### Common failure patterns

#### Node execution failures

**Symptom**: Node shows as "Failed" in red

**How to debug**: In the UI:

1. Click the failed node
2. Select "View execution"
3. Check the Logs tab

**Common causes**:

* Code errors (Python exceptions, import failures)
* Out of memory or disk space
* Missing dependencies in Docker image
* Incorrect file paths

#### Pipeline configuration errors

**Symptom**: Pipeline fails to start or shows "Crashed"

**Pipeline log messages and solutions**:

```
Node "train" transitioned to crashed
```

**Cause**: Execution failed within the node\
**Fix**: Check node's execution logs for the actual error

```
Stopping due to 1 incompletable edges
```

**Cause**: Required inputs missing\
**Fix**: Verify all edges are correctly defined and source nodes produce expected outputs

```
No valid environment found for node
```

**Cause**: Specified environment doesn't exist or user lacks access\
**Fix**: Check environment slug with `vh environments` and verify permissions

### Step-by-step debugging process

#### 1. Examine pipeline logs first

Look for orchestration issues:

* Missing edges
* Invalid node references
* Parameter mismatches
* Environment problems

#### 2. Check individual node logs

For execution failures:

**Via web interface**:

1. Click on the failed node (red) in the pipeline graph
2. Click "View execution" in the popup
3. Navigate to the "Logs" tab
4. Use the log filters to show/hide stdout, stderr, or system logs

**Via CLI**:

```shell
# View specific node logs
vh execution logs EXECUTION_ID

# Or download full logs
vh execution logs EXECUTION_ID > debug_logs.txt
```

#### 3. Verify data flow

Ensure outputs exist and match expected names:

```python
# In your code, add debug output
import os

print("Files in output:", os.listdir("/valohai/outputs"))
```

### Preventing silent failures

#### Problem: Step fails but shows "Completed"

By default, Valohai runs all commands even if one fails:

```yaml
# Problematic configuration
command:
  - python preprocess.py     # Fails
  - python train.py          # Still runs!
  - python evaluate.py       # Also runs
```

#### Solution: Add error handling

```yaml
# Fail fast on any error
command:
  - set -e  # Exit on first error
  - python preprocess.py
  - python train.py
  - python evaluate.py
```

Or use Python-specific error handling:

```yaml
command:
  - python -u preprocess.py || exit 1
  - python -u train.py || exit 1
  - python -u evaluate.py || exit 1
```

### YAML configuration debugging

#### Lint before committing

Always validate your `valohai.yaml`:

```shell
vh lint

# Example error:
# error: PipelineParameter.__init__() missing 'targets'
```

#### Common YAML issues

**Missing targets**:

```yaml
# Wrong
parameters:
  - name: batch_size
    default: 32
```

```yaml
# Correct  
parameters:
  - name: batch_size
    targets:
      - train.parameters.batch_size
    default: 32
```

**Incorrect indentation**:

```yaml
# Wrong (3 spaces)
nodes:
   - name: train
```

```yaml
# Correct (2 spaces)
nodes:
  - name: train
```

### Advanced debugging techniques

#### 1. Add debug nodes

Insert lightweight debug nodes between steps:

```yaml
- name: debug-features
  type: execution
  step: debug-step
  command:
    - ls -la /valohai/inputs/
    - head -n 5 /valohai/inputs/features/*
    - echo "File count: $(ls /valohai/inputs/features | wc -l)"
```

#### 2. Use conditional debugging

Add debug output based on parameters:

```python
import valohai

debug_mode = valohai.parameters("debug").value
if debug_mode:
    print("=== DEBUG: Input shapes ===")
    print(f"Training data: {X_train.shape}")
    print(f"First 5 samples:\n{X_train[:5]}")
```

#### 3. Implement checkpoint logging

Log progress at key points:

```python
import json


def log_checkpoint(stage, metrics):
    checkpoint = {
        "stage": stage,
        "timestamp": datetime.now().isoformat(),
        "metrics": metrics,
    }
    print(json.dumps(checkpoint))


# Usage
log_checkpoint("preprocessing_complete", {"samples": len(data)})
log_checkpoint("training_started", {"epochs": epochs})
```

### Quick reference: Error messages

| Error                       | Location      | Likely Cause           | Solution                                  |
| --------------------------- | ------------- | ---------------------- | ----------------------------------------- |
| "No such file or directory" | Node logs     | Missing input file     | Check edge definitions and output names   |
| "Out of memory"             | Node logs     | Insufficient resources | Use larger environment                    |
| "incompletable edges"       | Pipeline logs | Missing node outputs   | Verify source node completed successfully |
| "Module not found"          | Node logs     | Missing dependency     | Add to Docker image or pip install        |
| "Permission denied"         | Node logs     | File access issue      | Check file permissions in outputs         |

### Best practices

1. **Always use `set -e`** in multi-command steps
2. **Validate YAML** before committing with `vh lint`
3. **Log liberally** during development
4. **Name outputs clearly** to avoid edge mismatches
5. **Test nodes individually** before pipeline integration
6. **Use version control** for configurations


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/pipelines/debug-failed-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
