Creating a pipeline with valohai-utils
¶
Note
This tutorial is a part of our Valohai pipelines series. Make sure you have completed the Valohai fundamentals learning path first.
Even though not mandatory, we strongly recommend that you Connect to a Git repository before using pipelines. Using Git is not a Valohai requirement but it will make your workflows more efficient.
In this section you will
Learn how to use
valohai-utils
helper library to build and update pipelines invalohai.yaml
.
Previously, you defined the pipeline manually in your valohai.yaml
. Alternatively, you can define pipelines using the valohai-utils Toolkit helper library.
Start by creating a new file with the following code in it.
Create a separate Python script for your pipeline definition
When creating the valohai.yaml
and updating the step information with vh yaml step <filename>
the code in the file is only parsed and not executed. For vh yaml pipeline <filename>
the main
method defined below is actually run.
This means that you should have all the libraries that will be imported installed. Thus, it might make more sense to have the pipeline code in a separate file.
from valohai import Pipeline
def main(config) -> Pipeline:
#Create a pipeline called "utilspipeline".
pipe = Pipeline(name="utilspipeline", config=config)
# Define the pipeline nodes.
preprocess = pipe.execution("preprocess-dataset")
train = pipe.execution("train-model")
# Configure the pipeline, i.e. define the edges.
preprocess.output("preprocessed_mnist.npz").to(train.input("dataset"))
return pipe
Next, run the command:
vh yaml pipeline <filename>
If you now check your valohai.yaml
, you should now have a new pipeline called utilspipeline
. Even though otherwise exactly alike, the edges look different for the two pipelines.
When creating the pipeline manually, you used the shorthand syntax but valohai-utils
doesn’t. Regardless, the pipeline will be built in a similar manner in the UI.
- pipeline:
name: utilspipeline
edges:
- configuration: {}
source: preprocess-dataset_1.output.preprocessed_mnist.npz
target: train-model_1.input.dataset
nodes:
- name: preprocess-dataset_1
actions: []
override: {}
step: preprocess-dataset
type: execution
- name: train-model_1
actions: []
override: {}
step: train-model
type: execution
You can now test this pipeline by running an --adhoc
pipeline from the command line.
vh pipeline run utilspipeline --adhoc
If you want to run the pipeline in the Valohai UI, remember to push the changes to your Git repository.
git add valohai.yaml
git commit -m "Added pipeline definition"
git push
Instead of the UI, you can of course run the the pipeline from the CLI. Remember to fetch the changes to your Valohai project either by running the fetch
command on the CLI or pushing the button in the UI.
vh project fetch
vh pipeline run utilspipeline
Note
Be careful that you are actually reading and updating the right valohai.yaml
file when creating pipelines. If you get a message saying that the step is not found, check both the valohai.yaml
and the project linked to your working directory.