Using Inputs with
Let’s assume we have something similar to the following set up in Valohai YAML:
- step: # ... name: train-model # ... inputs: - name: training-set-images - name: training-set-labels
And you have a local project linked to Valohai, then you can run the step with the following.
$ vh exec run train-model
But this will crash because the inputs aren’t defined.
So, how can you refer to various datasets?
Option #1: Custom Store URL¶
You can connect private data stores to Valohai projects.
If you connect a store that contains files that Valohai doesn’t know about, like the files that you have uploaded there yourself, you can use the following syntax to refer to the files.
- Azure Blob Storage:
- Google Storage:
- Amazon S3:
- OpenStack Swift:
This syntax also has supports wildcard syntax to download multiple files:
s3://my-bucket/dataset/images/*.jpgfor all .jpg (JPEG) files
s3://my-bucket/dataset/image-sets/**.jpgfor recursing subdirectories for all .jpg (JPEG) files
$ vh exec run train-model \ --training-set-images=s3://my-bucket/dataset/images/train.zip \ --training-set-labels=s3://my-bucket/dataset/labels/train.zip
Option #2: Datum URI¶
You can use the
datum://<identifier> syntax to refer to specific files Valohai platform already knows about.
Files will have a datum URL if the files were uploaded to Valohai either:
- as outputs from another execution
- or using the Valohai web interface uploader under “Data” tab of the project
You can find the datum URL by clicking the “datum://” button under “Data” tab of the project.
$ vh exec run train-model \ --training-set-images=datum://01685ff1-5a7a-c36b-e79e-80623acea29f \ --training-set-labels=datum://01685ff1-5930-8c09-83d1-cd174c9770ab
Option #3: Public HTTP(S) URL¶
If your dataset is public and available through an HTTP(S) address, you can use that.
$ vh exec run train-model \ --training-set-images=https://example.com/train-images.zip \ --training-set-labels=https://example.com/train-labels.zip