Pipelines automate your machine learning operations on Valohai ecosystem.

See also

You can read more about the reasoning behind general pipeline concepts like graphs, nodes and edges on the Pipelines core concepts page.

pipeline definition has 3 required properties:

  • name: name for the pipeline

  • nodes: list of all nodes (executions and deployments) in the pipeline

  • edges: list of all edges (requirements) between the nodes

A simple pipeline could look something like this:


  - step:
      name: generate-dataset
      image: python:3.6
      command: python preprocess.py
  - step:
      name: train-model
      image: tensorflow/tensorflow:2.2.0-gpu
      command: python train.py
        - name: dataset-images
          default: http://...
        - name: dataset-labels
          default: http://...
  - pipeline:
      name: simple-pipeline
        - name: generate-node
          type: execution
          step: generate-dataset
        - name: train-node
          type: execution
          step: train-model
        - name: deploy-node
          type: deployment
          deployment: mydeployment
            - predict-digit
        - [generate-node.output.images*, train-node.input.dataset-images]
        - [generate-node.output.labels*, train-node.input.dataset-labels]
        - [train-node.output.model*, deploy-node.file.predict-digit.model]
  - endpoint:
      name: predict-digit
      description: predict digits from image inputs ("file" parameter)
      image: tensorflow/tensorflow:1.13.1-py3
      wsgi: predict_wsgi:predict_wsgi
        - name: model
          description: Model output file from TensorFlow
          path: model.pb

Here we have a pipeline with 3 nodes, and the second node train will wait its inputs to be generated by generate node. The third node deploys the model outputted by the train node. All files in /valohai/outputs that start with either images or labels will be passed between the executions.

Override default inputs

In the above example:

  • The train-model step has two inputs, each with their own default values.

  • The pipeline we defines that the train-model node should use the outputs of generate-dataset as its inputs.

By default Valohai will include both files from the default input location and the files generated by the pipeline as the step’s inputs. You can specify an override in the pipeline, if instead you want the input from the pipeline to override the default input.

- name: train
  type: execution
  step: train-model
      - name: dataset-images
      - name: dataset-labels