Pipelines automate your machine learning operations on Valohai ecosystem.
You can read more about the reasoning behind general pipeline concepts like graphs, nodes and edges on the Pipelines core concepts page.
pipeline definition has 3 required properties:
name: name for the pipeline
nodes: list of all nodes (executions) in the pipeline
edges: list of all edges (requirements) between the nodes
A simple pipeline could look something like this:
--- - step: name: generate-dataset image: python:3.6 command: python preprocess.py - step: name: train-model image: tensorflow/tensorflow:2.2.0-gpu command: python train.py inputs: - name: dataset-images default: http://... - name: dataset-labels default: http://... - pipeline: name: simple-pipeline nodes: - name: generate type: execution step: generate-dataset - name: train type: execution step: train-model edges: - [generate.output.images*, train.input.dataset-images] - [generate.output.labels*, train.input.dataset-labels]
Here we have a pipeline with 2 nodes, and the second node train will wait its inputs to be generated
by generate node. All files in
/valohai/outputs that start with either
labels will be passed
between the executions.
Override default inputs
In the above example:
train-modelstep has two inputs, each with their own default values.
- The pipeline we defines that the
train-modelnode should use the outputs of
generate-datasetas its inputs.
By default Valohai will include both files from the default input location and the files generated by the pipeline as the step’s inputs. You can specify an override in the pipeline, if instead you want the input from the pipeline to override the default input.
- name: train type: execution step: train-model override: inputs: - name: dataset-images - name: dataset-labels