Inputs: Access Your Data
Valohai handles secure access to your files in object storages like for example AWS S3, Azure Blob Storage, Google Cloud Storage, and more.
💡 Your data stays where it is. Valohai downloads files only when needed and manages caching automatically.
How Inputs Work
Configure data store access once (project or organization level)
Define inputs in
valohai.yamlwith cloud URLsAccess files locally at
/valohai/inputs/, no download code needed
Valohai handles authentication, parallel downloads, and caching behind the scenes.
Quick Example
Define in valohai.yaml
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command:
- python train_model.py
inputs:
- name: images
default:
- s3://mybucket/factories/images/*.png
keep-directories: suffix
- name: model
default: datum://production-latest
filename: model.pklThe inputs will be downloaded to /valohai/inputs/.
In the above case, you’ll find your:
imagesin the directory/valohai/inputs/images/with the folder structure from your object data stores intact.modelwill be downloaded to the directory/valohai/inputs/model/and the file will always be renamed tomodel.pkl
Use in Python
In Python you’ll access these files like any other file, as they’ll be available locally on the machine.
import os
from PIL import Image
# Path to downloaded model
model_path = '/valohai/inputs/model/model.pkl'
# Path to the directory where the images are downloaded
images_directory = '/valohai/inputs/images/'
# Loop through images in the directory
for root, dirs, files in os.walk(images_directory):
for filename in files:
image_path = os.path.join(root, filename)
image_name = os.path.basename(image_path)
image = Image.open(image_path)
image.load()That's it. No boto3, no credentials, no download loops.
Common Patterns
Multiple Files with Wildcards
inputs:
- name: images
default:
- s3://mybucket/train/images/*.jpg
- s3://mybucket/train/images/*.pngAll matching files download to /valohai/inputs/images/.
Multiple Cloud Sources
inputs:
- name: data
default:
- s3://aws-bucket/data/*.parquet
- azure://mycontainer/data/*.parquet
- gs://gcs-bucket/data/*.parquetMix and match storage providers in one input. All the files will be downloaded under /valohai/inputs/data/.
💡 Files defined under the same input are downloaded to the same directory. If their names are not unique, they will override each other and only one of them will be available in the execution.
Single File with Rename
inputs:
- name: pretrained
default: s3://models/bert-base.h5
filename: model.h5 # Always save as this nameAccess at /valohai/inputs/pretrained/model.h5.
Keep Directory Structure
keep-directories is used to define what folder structure should Valohai use in the inputs folder.
none: (default) all files are downloaded to
/valohai/inputs/myinputfull: keeps the full path from the storage root. For example
s3://special-bucket/foo/bar/**.jpgcould end up as/valohai/inputs/myinput/foo/bar/dataset1/a.jpgsuffix: keeps the suffix from the “wildcard root”. For example
s3://special-bucket/foo/bar/*the special-bucket/foo/bar/ would be removed, but any relative path after it would be kept, and you might end up with/valohai/inputs/myinput/dataset1/a.jpg
inputs:
- name: dataset
default: s3://bucket/project/**/*.json
keep-directories: suffix # Preserves folder structureOverride Inputs at Runtime
Default inputs are just starting points. Override them when running:
# CLI
vh execution run train-model \
--dataset=s3://different-bucket/experiment-data/*.csv
# Or use the web UI to browse and select different files
# Or pass different URLs via APIQuick Reference
Define inputs in valohai.yaml
valohai.yamlinputs:
- name: mydata
default: s3://bucket/*.csv
# Access at: /valohai/inputs/mydata/file.csvUse as local files
Inputs are available under /valohai/inputs/{input-name}/ :
import pandas as pd
import glob
# Files are already downloaded to /valohai/inputs/mydata/
csv_files = glob.glob('/valohai/inputs/mydata/*.csv')Dynamic File Selection
Don't hardcode paths in YAML. Pass them at runtime:
vh execution run train \
--images=s3://bucket/client-xyz/images/*.jpgInputs can be overridden at runtime
vh execution run train-model \
--dataset=s3://different-bucket/experiment-data/*.csv
# Or in the UI / with the APIOptions
filename: newname.ext— Rename single input file on downloadkeep-directories: suffix— Preserve folder structure
Last updated
Was this helpful?
