Pass Data Between Pipeline Nodes

Pipeline edges define how data flows between nodes. This guide covers the three types of connections: file outputs, parameters, and metadata.

File outputs to inputs

The most common pipeline pattern: one node produces files, the next consumes them.

Basic file passing

- step:
    name: train-model
    image: tensorflow/tensorflow:2.4.1
    command: python train.py

- step:
    name: test-model
    image: tensorflow/tensorflow:2.4.1
    command: python test.py
    inputs:
      - name: model

- pipeline:
    name: training-pipeline
    nodes:
      - name: train-model
        step: train-model
        type: execution
      - name: test-model
        step: test-model
        type: execution
    edges:
    - [train-model.output.model.pkl, test-model.input.model]

The train-model step saves model.pkl to /valohai/outputs/. The edge passes this file to test-model as its model input where it'll be available at /valohai/inputs/model/model.pkl

You can also use files from subdirectories in the edges. For example, if you have a subdirectory called results and a file model.pkl in it, i.e. path /valohai/outputs/results/model.pkl, the edge would look like this:

💡 Use wildcards to pass multiple files: train-model.output.*.pkl passes all pickle files.. For files under a subdirectory called results: train-model.output.results/*.pkl

Edge merge modes

Control how edges interact with default inputs using edge-merge-mode:

Merge modes:

  • replace (default): Edge data replaces default inputs

  • append: Edge data adds to default inputs

Parameter passing

Share configuration values between nodes without modifying code.

Static parameter passing

Pass a parameter value from one node to another:

The test-model node inherits the user-id value from train-model.

Metadata to parameters

Use runtime-generated values to configure downstream nodes.

Single value metadata

Generate metadata in your code:

Pass it to the next node:

Multi-value metadata for tasks

Generate multiple values to create parallel task executions:

💡 Tasks create one execution per value in the metadata list. With 4 user IDs, you get 4 parallel executions.

For multi-dimensional parameters:

Common issues and fixes

Parameter not passed

Symptom: Downstream node uses default value instead of edge value

Fix: Verify parameter names match exactly between edge definition and step parameters:

Outputs not available

Error: FileNotFoundError: /valohai/inputs/model/result.csv

Fix: Ensure the upstream node saves files to /valohai/outputs/:

Conditional output handling

Issue: Optional outputs cause downstream failures

Fix: Add existence checks in consuming nodes:

Metadata not recognized

Symptom: Metadata edge doesn't populate parameter

Fix: Ensure metadata is valid JSON printed to stdout:

Best practices

  1. Name outputs descriptively: Use model.pkl instead of output.pkl

  2. Validate inputs exist: Always check for file existence in consuming nodes

  3. Log metadata early: Print metadata as soon as values are determined

  4. Use type-specific nodes:

    • execution for single runs

    • task for parallel processing with metadata lists

Next steps

Last updated

Was this helpful?