In most pipelines, the edges between nodes are defined by outputs and inputs. Edges can be also used to pass parameters between nodes.
- step:
name: train-model
image: tensorflow/tensorflow:2.4.1
command: python train.py {parameters}
parameters:
- name: user-id
default: 345345
multiple-separator: ','
optional: false
type: integer
- step:
name: test-model
image: tensorflow/tensorflow:2.4.1
command: python train.py {parameters}
parameters:
- name: user-id
default: 3
multiple-separator: ','
optional: false
type: integer
- pipeline:
name: Parameter to parameter
nodes:
- name: train-model
step: train-model
type: execution
- name: test-model
step: test-model
type: execution
edges:
- [train-model.parameter.user-id, test-model.parameter.user-id]
Dynamic parameter value
Another option is that the train-model
step will determine the value of user-id
for the test-model
step.
In our example, we make the assumption that train.py
will produce Valohai metadata containing the user id by generating JSON output, like so:
import json
print(json.dumps({"user": 463}))
We can now use this programmatically generated value to define the user-id
in the test-node
by establishing a metadata-to-parameter connection.
- step:
name: train-model
image: tensorflow/tensorflow:2.4.1
command: python train.py {parameters}
- step:
name: test-model
image: tensorflow/tensorflow:2.4.1
command: python train.py {parameters}
parameters:
- name: user-id
default: 3
optional: false
type: integer
- pipeline:
name: Parameter to parameter
nodes:
- name: train-model
step: train-model
type: execution
- name: test-model
step: test-model
type: task
edges:
- [train-model.metadata.user, test-model.parameter.user-id]
We can also pass multi-value parameters in metadata as a list. However, only one edge can fill one parameter, so multiple nodes can’t contribute values to the same parameter even if the parameter allows multiple values.
Parallelel jobs
In our example, the test-model
node is configured as a Task. This configuration enables us to execute a series of parallel runs within the node, with each run having its own distinct user-id
value.
To achieve this, you can modify your Python code to print a metadata list, like so: print(json.dumps({"user": [463, 674, 888, 233]}))
Multi-value parameters can also be passed to the task as a two-dimensional metadata list: print(json.dumps({"user": [[204, 302], [593, 120]]}))