step.inputs

inputs are the data files that are available during step execution.

An input in inputs has three potential properties:

  • name: The input name; this is shown on the user interface and names the directory where the input files will be placed during execution like /valohai/inputs/my-input-name.
  • default: (optional) The default source where the input will be fetched from. If not defined, the user has to define the source at the start of the execution.
  • optional: (optional) Marks that this input is optional and an URL definition is not necessary before execution of the step.
  • filename: (optional) set a custom name to the downloaded file.
  • keep-directories: (optional) governs how directory paths are retained when using wildcards.
    • none: (default) all files are downloaded to /valohai/inputs/my-input-name
    • full: keeps the full path from the storage root.
      • E.g. for s3://special-bucket/foo/bar/**.jpg could end up as /valohai/inputs/my-input-name/foo/bar/dataset1/a.jpg
    • suffix: keeps the suffix from the “wildcard root”.
      • E.g. for s3://special-bucket/foo/bar/**.jpg special-bucket/foo/bar/ would be removed, but any relative path after it would be kept, and you might end up with /valohai/inputs/my-input-name/dataset1/a.jpg

Currently valid sources for inputs are HTTP(S) and various cloud provider specific data stores such as AWS S3 (s3://...) and Azure Storage (azure://...).

See also

Read more about custom data stores from Data Stores documentation page.

For these HTTP(S) endpoints basic access authentication is supported, but for the cloud provider stores, the access credentials must be configured under project settings.

During the step execution, inputs are available under /valohai/inputs/<input name>/<input file>. To see this in action, try running ls -la /valohai/inputs/ as the main command of execution which has inputs.

Tip

You can download any files you want during the execution with e.g. Python libraries or command-line tools but then your executions become slower as it circumvents our input file caching system.

When you specify the actual input or default for one, you have 3 options:

Option #1: Custom Store URL

You can connect private data stores to Valohai projects.

If you connect a store that contains files that Valohai doesn’t know about, like the files that you have uploaded there yourself, you can use the following syntax to refer to the files.

  • Azure Blob Storage: azure://{account_name}/{container_name}/{blob_name}
  • Google Storage: gs://{bucket}/{key}
  • Amazon S3: s3://{bucket}/{key}
  • OpenStack Swift: swift://{project}/{container}/{key}

This syntax also has supports wildcard syntax to download multiple files:

  • s3://my-bucket/dataset/images/*.jpg for all .jpg (JPEG) files
  • s3://my-bucket/dataset/image-sets/**.jpg for recursing subdirectories for all .jpg (JPEG) files

Tip

If you are using your own data store, we show the exact location for each file through Data browser (2).

Where to find the file path in your data store.

Option #2: Datum URI

You can use the datum://<identifier> syntax to refer to specific files Valohai platform already knows about.

Files will have a datum identifier if the files were uploaded to Valohai either:

  1. by another execution, or
  2. by using the Valohai web interface uploader under “Data” tab of the project

Tip

Find the datum URL through the “datum://” button under “Data” tab of your project.

Where to find datum URL with identifier.

Option #3: Public HTTP(S) URL

If your data is available through an HTTP(S) address, use the URL as-is.