Using Inputs with valohai-cli

Let’s assume we have something similar to the following set up in Valohai YAML:

- step:
  # ...
  name: train-model
  # ...
  inputs:
    - name: training-set-images
    - name: training-set-labels

And you have a local project linked to Valohai, then you can run the step with the following.

$ vh exec run train-model

But this will crash because the inputs aren’t defined.

So, how can you refer to various datasets?

Option #1: Custom Store URL

You can connect private data stores to Valohai projects.

If you connect a store that contains files that Valohai doesn’t know about, like the files that you have uploaded there yourself, you can use the following syntax to refer to the files.

  • Azure Blob Storage: azure://{account_name}/{container_name}/{blob_name}
  • Google Storage: gs://{bucket}/{key}
  • Amazon S3: s3://{bucket}/{key}
  • OpenStack Swift: swift://{project}/{container}/{key}

This syntax also has supports wildcard syntax to download multiple files:

  • s3://my-bucket/dataset/images/*.jpg for all .jpg (JPEG) files
  • s3://my-bucket/dataset/image-sets/**.jpg for recursing subdirectories for all .jpg (JPEG) files

Tip

If you are using your own data store, we show the exact location for each file through Data browser (2).

Where to find the file path in your data store.

Usage example:

$ vh exec run train-model \
    --training-set-images=s3://my-bucket/dataset/images/train.zip \
    --training-set-labels=s3://my-bucket/dataset/labels/train.zip

Option #2: Datum URI

You can use the datum://<identifier> syntax to refer to specific files Valohai platform already knows about.

Files will have a datum identifier if the files were uploaded to Valohai either:

  1. by another execution, or
  2. by using the Valohai web interface uploader under “Data” tab of the project

Tip

Find the datum URL through the “datum://” button under “Data” tab of your project.

Where to find datum URL with identifier.

Usage example:

$ vh exec run train-model \
    --training-set-images=datum://01685ff1-5a7a-c36b-e79e-80623acea29f \
    --training-set-labels=datum://01685ff1-5930-8c09-83d1-cd174c9770ab

Option #3: Public HTTP(S) URL

If your data is available through an HTTP(S) address, use the URL as-is.

Usage example:

$ vh exec run train-model \
    --training-set-images=https://example.com/train-images.zip \
    --training-set-labels=https://example.com/train-labels.zip