Using Inputs with valohai-cli¶
Let’s assume we have something similar to the following set up in Valohai YAML:
- step: # ... name: train-model # ... inputs: - name: training-set-images - name: training-set-labels
And you have a local project linked to Valohai, then you can run the step with the following.
vh exec run train-model
But this will crash because the inputs aren’t defined.
So, how can you refer to various datasets?
Option #1: Custom Store URL¶
You can connect private data stores to Valohai projects.
If you connect a store that contains files that Valohai doesn’t know about, like the files that you have uploaded there yourself, you can use the following syntax to refer to the files.
Azure Blob Storage:
This syntax also has supports wildcard syntax to download multiple files:
s3://my-bucket/dataset/images/*.jpgfor all .jpg (JPEG) files
s3://my-bucket/dataset/image-sets/**.jpgfor recursing subdirectories for all .jpg (JPEG) files
If you are using your own data store, we show the exact location for each file through Data browser (2).
vh exec run train-model \ --training-set-images=s3://my-bucket/dataset/images/train.zip \ --training-set-labels=s3://my-bucket/dataset/labels/train.zip
Option #2: Datum URI¶
You can use the
datum://<identifier> syntax to refer to specific files Valohai platform already knows about.
Files will have a datum identifier if the files were uploaded to Valohai either:
by another execution, or
by using the Valohai web interface uploader under “Data” tab of the project
vh exec run train-model \ --training-set-images=datum://01685ff1-5a7a-c36b-e79e-80623acea29f \ --training-set-labels=datum://01685ff1-5930-8c09-83d1-cd174c9770ab
Option #3: Public HTTP(S) URL¶
If your data is available through an HTTP(S) address, use the URL as-is.
vh exec run train-model \ --training-set-images=https://example.com/train-images.zip \ --training-set-labels=https://example.com/train-labels.zip