# Inputs: Access Your Data

Valohai handles secure access to your files in object storages like for example AWS S3, Azure Blob Storage, Google Cloud Storage, and more.

> 💡 **Your data stays where it is.** Valohai downloads files only when needed and manages caching automatically.

### How Inputs Work

1. **Configure data store access** once (project or organization level)
2. **Define inputs** in `valohai.yaml` with cloud URLs
3. **Access files locally** at `/valohai/inputs/`, no download code needed

Valohai handles authentication, parallel downloads, and caching behind the scenes.

### Quick Example

#### Define in valohai.yaml

```yaml
- step:
    name: train-model
    image: tensorflow/tensorflow:2.6.0
    command:
        - python train_model.py
    inputs:
        - name: images
          default:
          - s3://mybucket/factories/images/*.png
          keep-directories: suffix
        - name: model
          default: datum://production-latest
          filename: model.pkl
```

The inputs will be downloaded to `/valohai/inputs/`.

In the above case, you'll find your:

* `images` in the directory `/valohai/inputs/images/` with the folder structure from your object data stores intact.
* `model` will be downloaded to the directory `/valohai/inputs/model/` and the file will always be renamed to `model.pkl`

#### Use in Python

In Python you'll access these files like any other file, as they'll be available locally on the machine.

```python
import os
from PIL import Image

# Path to downloaded model
model_path = "/valohai/inputs/model/model.pkl"

# Path to the directory where the images are downloaded
images_directory = "/valohai/inputs/images/"

# Loop through images in the directory
for root, dirs, files in os.walk(images_directory):
    for filename in files:
        image_path = os.path.join(root, filename)

        image_name = os.path.basename(image_path)
        image = Image.open(image_path)
        image.load()
```

That's it. No boto3, no credentials, no download loops.

<details>

<summary>Optional: Use the <code>valohai-utils</code> Python helper tool</summary>

The `valohai-utils` helper library offers a simpler syntax:

```python
import valohai
import os
from PIL import Image

# Path to downloaded model
model_path = valohai.inputs("model").path()

# Path to the directory where the images are downloaded
images_directory = valohai.inputs("images").dir_path()

# Loop through images in the directory
for root, dirs, files in os.walk(images_directory):
    for filename in files:
        image_path = os.path.join(root, filename)

        image_name = os.path.basename(image_path)
        image = Image.open(image_path)
        image.load()
```

</details>

### Common Patterns

#### Multiple Files with Wildcards

```yaml
inputs:
    - name: images
      default:
        - s3://mybucket/train/images/*.jpg
        - s3://mybucket/train/images/*.png
```

All matching files download to `/valohai/inputs/images/`.

#### Multiple Cloud Sources

```yaml
inputs:
    - name: data
      default:
        - s3://aws-bucket/data/*.parquet
        - azure://mycontainer/data/*.parquet
        - gs://gcs-bucket/data/*.parquet
```

Mix and match storage providers in one input. All the files will be downloaded under `/valohai/inputs/data/`.

> 💡 Files defined under the same input are downloaded to the same directory. If their names are not unique, they will override each other and only one of them will be available in the execution.

#### Single File with Rename

```yaml
inputs:
    - name: pretrained
      default: s3://models/bert-base.h5
      filename: model.h5  # Always save as this name
```

Access at `/valohai/inputs/pretrained/model.h5`.

#### Keep Directory Structure

`keep-directories` is used to define what folder structure should Valohai use in the inputs folder.

* **none:** (default) all files are downloaded to `/valohai/inputs/myinput`
* **full:** keeps the full path from the storage root. For example `s3://special-bucket/foo/bar/**.jpg` could end up as `/valohai/inputs/myinput/foo/bar/dataset1/a.jpg`
* **suffix:** keeps the suffix from the "wildcard root". For example `s3://special-bucket/foo/bar/*` the special-bucket/foo/bar/ would be removed, but any relative path after it would be kept, and you might end up with `/valohai/inputs/myinput/dataset1/a.jpg`

```yaml
inputs:
    - name: dataset
      default: s3://bucket/project/**/*.json
      keep-directories: suffix  # Preserves folder structure
```

### Override Inputs at Runtime

Default inputs are just starting points. Override them when running:

```shell
# CLI
vh execution run train-model --dataset=s3://different-bucket/experiment-data/*.csv

# Or use the web UI to browse and select different files
# Or pass different URLs via API
```

### Quick Reference

#### Define inputs in `valohai.yaml`

```yaml
inputs:
    - name: mydata
      default: s3://bucket/*.csv
# Access at: /valohai/inputs/mydata/file.csv
```

#### Use as local files

Inputs are available under `/valohai/inputs/{input-name}/` :

```python
import pandas as pd
import glob

# Files are already downloaded to /valohai/inputs/mydata/
csv_files = glob.glob("/valohai/inputs/mydata/*.csv")
```

#### Dynamic File Selection

Don't hardcode paths in YAML. Pass them at runtime:

```shell
vh execution run train --images=s3://bucket/client-xyz/images/*.jpg
```

#### Inputs can be overridden at runtime

```shell
vh execution run train-model --dataset=s3://different-bucket/experiment-data/*.csv

# Or in the UI / with the API
```

#### Options

* `filename: newname.ext` — Rename single input file on download
* `keep-directories: suffix` — Preserve folder structure

***
