In our example below, the train-model
step is responsible for training a model file and saving it to /valohai/outputs/model.pkl
. This file is preserved, uploaded to your storage, and versioned.
In the edge definition, we specify that we intend to pass this generated file to the test-model
node and position it as the model
input.
- step:
name: train-model
image: tensorflow/tensorflow:2.4.1
command: python train.py
- step:
name: test-model
image: tensorflow/tensorflow:2.4.1
command: python test.py
inputs:
- name: model
- pipeline:
name: Data example
nodes:
- name: train-model
step: train-model
type: execution
- name: test-model
step: test-model
type: execution
edges:
- [train-model.output.model.pkl, test-model.input.model]
Edge merge mode
By default the inputs coming from the previous node will override the default inputs for the step used to define the node. In some cases, you might want to append the inputs from the previous node to the default ones instead. To do this, you can set the pipeline property edge-merge-mode
to append
.
In the pipeline example below, the train-model
step has an input called preprocessed_dataset
with a default value. Since the edge-merge-mode
is set to append
in the pipeline, the files coming from the preprocess
node are appended to the original files. If the edge-merge-mode
was not define (or was set to replace
, the model training would be done with only the files coming from the preprocess
step.
- step:
name: preprocess
image: python:3.10
inputs:
- name: dataset
default: s3://mybucket/data/*
command: python preprocess.py
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command: python train.py
inputs:
- name: preprocessed_dataset
default: s3://mybucket/preprocessed_data/*
- pipeline:
name: Edge merge mode example
nodes:
- name: preprocess
step: preprocess
type: execution
- name: train-model
step: train-model
type: execution
edge-merge-mode: append
edges:
- [preprocess.output.*, train-model.input.preprocessed_dataset]