Pipelines

Note

Valohai pipelines are in private beta. Send an email to info@valohai.com for details.

See also

For technical specifications, go to valohai.yaml pipeline section.

Pipeline is a version controlled collection of executions some of which rely on the results of the previous executions thus creating a directed graph. These pipeline graphs consist of nodes and edges and we’ll discuss these further down.

For example, consider the following sequence of data science operations:

  1. preprocess dataset on a memory-optimized machine
  2. train multiple machine learning models on GPU machines using the preprocessed data (1)
  3. evaluate all of the trained models (2) to find the best one

We could say that you have 3 separate operations or steps: preprocess, train, evaluate.

This pipeline would have 3 or more nodes; at least one for each step mentioned above. Training could have additional nodes if you are training in parallel but lets keep it simple:

You configure data stores per project-basis on Valohai.

Nodes of the pipeline (the circles that receive something and/or produce something):

  • Each node has a list of requirements (“edges”, explained further below).
  • Each node will start automatically when all of the requirements have been met.
  • Nodes are currently exclusively Valohai executions.
  • More nodes types are being planned e.g. deployment and integrations with other services.

Edges of the pipeline (the lines between nodes) can be:

  • output files used as an input of an upcoming execution
  • metadata used as a parameter of an upcoming execution (in development)
  • copying a parameter of an previous execution to an upcoming execution (in development)

You can manage pipelines under the Pipelines tab on the web user interface if the feature has been enabled for your account and you have a pipeline defined in your valohai.yaml.

Full documentation how to define pipelines can be found under valohai.yaml pipeline section, but here is a brief overview what the above example pipeline could look like:

# define "preprocess", "train" and "evaluate" steps in the YAML...
- pipeline:
    name: example-pipeline
    nodes:
      - name: preprocess-node
        type: execution
        step: preprocess
      - name: train-node
        type: execution
        step: train
      - name: evaluate-node
        type: execution
        step: evaluate
    edges:
      - [preprocess-node.output.*x-images*, train-node.input.x-images]
      - [preprocess-node.output.*x-labels*, train-node.input.x-labels]
      - [preprocess-node.output.*y-images*, train-node.input.y-images]
      - [preprocess-node.output.*y-labels*, train-node.input.y-labels]
      - [train-node.output.model.pb, evaluate-node.input.model.pb]