Pipeline Error Handling

Control how your pipeline responds to failures. By default, any node failure stops the entire pipeline but you can customize this behavior for more resilient workflows.

Why customize error handling?

Default behavior works for critical paths where every step must succeed. But consider these scenarios:

  • Parallel model training: If 9 out of 10 hyperparameter combinations succeed, you want the best model, not a failed pipeline

  • Data quality checks: Optional validation that shouldn't block core processing

  • A/B testing: One model variant failing shouldn't prevent evaluating others

  • Batch processing: A few failed items shouldn't stop processing thousands of others

Error handling strategies

stop-all (default)

Any failure stops the entire pipeline immediately.

- pipeline:
    name: critical-pipeline
    nodes:
      - name: validate-data
        type: execution
        step: data-validation
        # on-error: stop-all # implicit

Use when: Every step is critical to the final output.

continue

Node completes regardless of failures. Downstream nodes still run.

Use when: You expect some failures and want to collect all successful results.

stop-next

Failed node blocks its dependents but allows parallel branches to continue.

Use when: This branch is optional, but if it runs, subsequent steps need its output.

Task node considerations

Error handling is especially important for task nodes running parallel executions:

With on-error: continue:

  • 3 models train successfully with reasonable learning rates

  • 1 fails with lr=1.0

  • select-best receives 3 models and picks the best

  • Pipeline succeeds overall

Practical example

Here's a pipeline that handles failures gracefully:

Debugging failed pipelines

View execution logs

Check individual execution logs to understand failures:

  1. Click on the failed node in the pipeline graph

  2. Select the failed execution

  3. Review logs for error messages

Last updated

Was this helpful?