Inputs: Access Your Data

Valohai handles secure access to your files in object storages like for example AWS S3, Azure Blob Storage, Google Cloud Storage, and more.

💡 Your data stays where it is. Valohai downloads files only when needed and manages caching automatically.

How Inputs Work

  1. Configure data store access once (project or organization level)

  2. Define inputs in valohai.yaml with cloud URLs

  3. Access files locally at /valohai/inputs/, no download code needed

Valohai handles authentication, parallel downloads, and caching behind the scenes.

Quick Example

Define in valohai.yaml

- step:
    name: train-model
    image: tensorflow/tensorflow:2.6.0
    command:
        - python train_model.py
    inputs:
        - name: images
          default:
          - s3://mybucket/factories/images/*.png
          keep-directories: suffix
        - name: model
          default: datum://production-latest
          filename: model.pkl

The inputs will be downloaded to /valohai/inputs/.

In the above case, you’ll find your:

  • images in the directory /valohai/inputs/images/ with the folder structure from your object data stores intact.

  • model will be downloaded to the directory /valohai/inputs/model/ and the file will always be renamed to model.pkl

Use in Python

In Python you’ll access these files like any other file, as they’ll be available locally on the machine.

That's it. No boto3, no credentials, no download loops.

Optional: Use the valohai-utils Python helper tool

The valohai-utils helper library offers a simpler syntax:

Common Patterns

Multiple Files with Wildcards

All matching files download to /valohai/inputs/images/.

Multiple Cloud Sources

Mix and match storage providers in one input. All the files will be downloaded under /valohai/inputs/data/.

💡 Files defined under the same input are downloaded to the same directory. If their names are not unique, they will override each other and only one of them will be available in the execution.

Single File with Rename

Access at /valohai/inputs/pretrained/model.h5.

Keep Directory Structure

keep-directories is used to define what folder structure should Valohai use in the inputs folder.

  • none: (default) all files are downloaded to /valohai/inputs/myinput

  • full: keeps the full path from the storage root. For example s3://special-bucket/foo/bar/**.jpg could end up as /valohai/inputs/myinput/foo/bar/dataset1/a.jpg

  • suffix: keeps the suffix from the “wildcard root”. For example s3://special-bucket/foo/bar/* the special-bucket/foo/bar/ would be removed, but any relative path after it would be kept, and you might end up with /valohai/inputs/myinput/dataset1/a.jpg

Override Inputs at Runtime

Default inputs are just starting points. Override them when running:

Quick Reference

Define inputs in valohai.yaml

Use as local files

Inputs are available under /valohai/inputs/{input-name}/ :

Dynamic File Selection

Don't hardcode paths in YAML. Pass them at runtime:

Inputs can be overridden at runtime

Options

  • filename: newname.ext — Rename single input file on download

  • keep-directories: suffix — Preserve folder structure


Last updated

Was this helpful?