Debug Pipeline Failures

When pipelines fail, quickly identify whether the issue is at the node level (execution failure) or pipeline level (configuration error). This guide covers both types of failures and debugging strategies.

Understanding pipeline logs

Pipelines have two types of logs:

Node logs

Individual execution logs for each step:

  1. Click on any node in the pipeline graph

  2. View the execution details and logs

  3. Check Logs tab for error messages

Pipeline logs

System-level logs for the pipeline orchestration:

  1. Navigate to the pipeline view

  2. Click the Logs tab

  3. Look for configuration or dependency errors

Common failure patterns

Node execution failures

Symptom: Node shows as "Failed" in red

How to debug: In the UI:

  1. Click the failed node

  2. Select "View execution"

  3. Check the Logs tab

Common causes:

  • Code errors (Python exceptions, import failures)

  • Out of memory or disk space

  • Missing dependencies in Docker image

  • Incorrect file paths

Pipeline configuration errors

Symptom: Pipeline fails to start or shows "Crashed"

Pipeline log messages and solutions:

Cause: Execution failed within the node Fix: Check node's execution logs for the actual error

Cause: Required inputs missing Fix: Verify all edges are correctly defined and source nodes produce expected outputs

Cause: Specified environment doesn't exist or user lacks access Fix: Check environment slug with vh environments and verify permissions

Step-by-step debugging process

1. Examine pipeline logs first

Look for orchestration issues:

  • Missing edges

  • Invalid node references

  • Parameter mismatches

  • Environment problems

2. Check individual node logs

For execution failures:

Via web interface:

  1. Click on the failed node (red) in the pipeline graph

  2. Click "View execution" in the popup

  3. Navigate to the "Logs" tab

  4. Use the log filters to show/hide stdout, stderr, or system logs

Via CLI:

3. Verify data flow

Ensure outputs exist and match expected names:

Preventing silent failures

Problem: Step fails but shows "Completed"

By default, Valohai runs all commands even if one fails:

Solution: Add error handling

Or use Python-specific error handling:

YAML configuration debugging

Lint before committing

Always validate your valohai.yaml:

Common YAML issues

Missing targets:

Incorrect indentation:

Advanced debugging techniques

1. Add debug nodes

Insert lightweight debug nodes between steps:

2. Use conditional debugging

Add debug output based on parameters:

3. Implement checkpoint logging

Log progress at key points:

Quick reference: Error messages

Error
Location
Likely Cause
Solution

"No such file or directory"

Node logs

Missing input file

Check edge definitions and output names

"Out of memory"

Node logs

Insufficient resources

Use larger environment

"incompletable edges"

Pipeline logs

Missing node outputs

Verify source node completed successfully

"Module not found"

Node logs

Missing dependency

Add to Docker image or pip install

"Permission denied"

Node logs

File access issue

Check file permissions in outputs

Best practices

  1. Always use set -e in multi-command steps

  2. Validate YAML before committing with vh lint

  3. Log liberally during development

  4. Name outputs clearly to avoid edge mismatches

  5. Test nodes individually before pipeline integration

  6. Use version control for configurations

Last updated

Was this helpful?