You can add conditions to your pipeline for example to stop a pipeline if certain metrics are below an expected range, or pause a pipeline before a human has reviewed the results.
This is just a short recap
We strongly recommend completing the Mastering Valohai learning path on Valohai Academy.
This guide will provide you with a overview and a “cheatsheet” for migrating projects. It won’t explain the concepts and all the options in detail.
A Valohai pipeline consists of:
-
nodes
that represent a single job inside the pipeline. Most commonly this is an execution but it could also be a task or a real-time deployment. -
edges
that represent how does the output of one job connect to the input of another job, for example output file -> input file, or metadata value -> parameter value.
Before you can define a pipeline, you’ll need to define steps
and make sure they work as expected.
Write a valohai.yaml
A pipeline is defined in the valohai.yaml
. Let’s assume we already have 3 working steps, and we want to connect them together into a pipeline.
-
preprocess
will output a new dataset (dataset://images/latest-train
) that will be connected to thetrain-model
step. -
train-model
will generate a new model file that will be used forbatch-inferece
-
batch-inference
will output results
- step:
name: preprocess
image: docker.io/python:3.10
command:
- pip install -r requirements.txt
- python data-preprocess.py
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- python train_model.py
inputs:
- name: data
default: dataset://images/latest-train
- step:
name: batch-inference
image: tensorflow/tensorflow:2.6.0
command:
- pip install pillow
- python batch_inference.py
inputs:
- name: test-images
default: dataset://images/latest-test
- name: model
default: datum://production-latest
- pipeline:
name: train-inference-pipeline
nodes:
- name: preprocess-node
type: execution
step: preprocess
- name: train-model-node
type: execution
step: train-model
- name: batch-inference-node
type: execution
step: batch-inference
edges:
- [preprocess-node.outputs.*, train-model-node.inputs.data]
- [train-model-node.outputs.*.pkl, batch-inference-node.inputs.model]
You can now run your pipeline from your local code (adhoc) with:
vh pipeline run train-inference-pipeline --adhoc
valohai-utils
valohai-utils
users can define pipelines using Python.
You can create a new file called example_pipeline.py
from valohai import Pipeline
def main(config) -> Pipeline:
#Create a pipeline called "utilspipeline".
pipe = Pipeline(name="train-inference-pipeline", config=config)
# Define the pipeline nodes.
preprocess = pipe.execution("preprocess")
train = pipe.execution("train-model")
inference = pipe.execution("batch-inference")
# Configure the pipeline, i.e. define the edges.
preprocess.output("*").to(train.input("data"))
preprocess.output("*.pkl").to(inference.input("model"))
return pipe
You can now generate the pipeline’s YAML definition with:
vh yaml pipeline example_pipeline.py