Inputs with Valohai CLI

Let’s assume we have something similar to the following set up in Valohai YAML:

- step:
  # ...
  name: train-model
  # ...
    - name: training-set-images
    - name: training-set-labels

And you have a local project linked to Valohai, then you can run the step with the following.

$ vh exec run train-model

But this will crash because the inputs aren’t defined.

So, how can you refer to various datasets?

Option #1: Datum URL

You can use the datum://<identifier> syntax to refer to input files.

Files will have a datum URL if the files were uploaded to Valohai either:

  1. as outputs from another execution
  2. or using the Valohai web interface uploader under “Data” tab of the project

You can find the datum URL by clicking the “datum://” button under “Data” tab of the project.

Usage example:

$ vh exec run train-model \
    --training-set-images=datum://01685ff1-5a7a-c36b-e79e-80623acea29f \

Option #2: Custom Store URL

You can connect custom private stores to Valohai projects.

If you connect a store that contains files that Valohai doesn’t know about, like the files that you have uploaded there yourself, you can use the following syntax to refer to files.

  • Azure Blob Storage: azure://{account_name}/{container_name}/{blob_name}
  • Google Storage: gs://{bucket}/{key}
  • Amazon S3: s3://{bucket}/{key}
  • OpenStack Swift: swift://{project}/{container}/{key}

This syntax also has supports wildcard syntax to download multiple files:

  • s3://my-bucket/dataset/images/*.jpg for all .jpg (JPEG) files
  • s3://my-bucket/dataset/image-sets/**.jpg for recursing subdirectories for all .jpg (JPEG) files

Usage example:

$ vh exec run train-model \
    --training-set-images=s3://my-bucket/dataset/images/ \

Option #3: Public HTTP(S) URL

If your dataset is public and available through an HTTP(S) address, you can use that.

Usage example:

$ vh exec run train-model \
    --training-set-images= \