Valohai pipelines offer a straightforward way to execute version-controlled, interconnected operations, creating a logical flow. Valohai pipelines are versatile, serving as (re-)training pipelines for models, batch inference pipelines, and data processing pipelines, streamlining your workflow with ease and efficiency.
Nodes and edges
Pipeline graphs consist of nodes and edges.
Nodes
Nodes within a pipeline are individual jobs that receive or produce data:
- Nodes can take the form of executions, tasks or deployments
- Each node automatically initiates once all prerequisites are met.
- Every node maintains a list of “edges” (explained below).
Edges
Edges in the pipeline, depicted as lines connecting nodes, can be:
- Output files utilized as input for subsequent executions or deployments.
- Input files, enabling the transfer of inputs from one node to another and ensuring multiple pipeline nodes share the same inputs.
- Parameters passed from one node to another.
- Metadata transmitted from the node where it was generated to a parameter in the following node.
Workflow
In a typical workflow, you’d start by defining individual jobs that will later become part of your pipeline. Initially, you run these jobs independently to ensure they function correctly on their own. After that, you create a pipeline in Valohai to specify the steps and determine how data and parameters will move between them.
Example pipeline in valohai.yaml
- step:
name: preprocess-dataset
image: python:3.9
command:
- pip install numpy valohai-utils
- python ./preprocess_dataset.py
inputs:
- name: dataset
default: https://valohaidemo.blob.core.windows.net/mnist/mnist.npz
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- pip install valohai-utils
- python ./train_model.py {parameters}
parameters:
- name: epochs
default: 5
type: integer
- name: learning_rate
default: 0.001
type: float
inputs:
- name: dataset
default: https://valohaidemo.blob.core.windows.net/mnist/preprocessed_mnist.npz
- step:
name: batch-inference
image: tensorflow/tensorflow:2.6.0
command:
- pip install pillow valohai-utils
- python ./batch_inference.py
inputs:
- name: model
- name: images
default:
- https://valohaidemo.blob.core.windows.net/mnist/four-inverted.png
- https://valohaidemo.blob.core.windows.net/mnist/five-inverted.png
- https://valohaidemo.blob.core.windows.net/mnist/five-normal.jpg
- pipeline:
name: training-pipeline
nodes:
- name: preprocess
type: execution
step: preprocess-dataset
- name: train
type: execution
step: train-model
override:
inputs:
- name: dataset
- name: evaluate
type: execution
step: batch-inference
edges:
- [preprocess.output.preprocessed_mnist.npz, train.input.dataset]
- [train.output.model*, evaluate.input.model]
Creating a pipeline
Web application
- Open your Project
- Go to the Pipelines tab
- Click on “Create pipeline”
- Choose the blueprint (based on the pipelines in your
valohai.yaml
) - Edit any configuration (parameters, data, etc.)
- Launch a pipeline
Command-line
vh pipeline run training-pipeline --adhoc
REST API
You can invoke a pipeline using the Valohai REST API by sending a POST
message to https://app.valohai.com/api/v0/pipelines/
with a JSON payload.
Example payload:
{
"edges": [
{
"source_node": "preprocess",
"source_key": "preprocessed_mnist.npz",
"source_type": "output",
"target_node": "train",
"target_type": "input",
"target_key": "dataset"
},
{
"source_node": "train",
"source_key": "model*",
"source_type": "output",
"target_node": "evaluate",
"target_type": "input",
"target_key": "model"
}
],
"nodes": [
{
"name": "preprocess",
"type": "execution",
"template": {
"environment": "0167d05d-a1d7-cc02-8256-6455a6ecfa56",
"commit": "17aa47f06f60c678c78e4b4389a7d30838b04401",
"step": "preprocess-dataset",
"image": "python:3.9",
"command": "pip install numpy valohai-utils\npython ./preprocess_dataset.py",
"inputs": {
"dataset": [
"https://valohaidemo.blob.core.windows.net/mnist/mnist.npz"
]
}
}
},
{
"name": "train",
"type": "execution",
"template": {
"commit": "main",
"step": "train-model",
"image": "tensorflow/tensorflow:2.6.0",
"command": "pip install valohai-utils\npython ./train_model.py {parameters}",
"inputs": {
"dataset": [
"https://valohaidemo.blob.core.windows.net/mnist/preprocessed_mnist.npz"
]
},
"parameters": {
"epochs": 5,
"learning_rate": 0.001
}
},
"on_error": "stop-all"
},
{
"name": "evaluate",
"type": "execution",
"template": {
"commit": "main",
"step": "batch-inference",
},
"on_error": "stop-all"
}
],
"project": "01774560-c649-7f96-bd60-c81a1c210190",
"title": "training-pipeline"
}