# Working with Data

> 💡 **About this tutorial:** We use YOLOv8 as a practical example to demonstrate Valohai's features. You don't need computer vision knowledge—the patterns you learn here apply to any ML framework. This tutorial focuses on ingesting data, running inference, and saving resulting files while ensuring proper versioning and tracking of your ML workflows.

### Define Inputs in valohai.yaml

Add input groups to any step in your `valohai.yaml`:

```yaml
- step:
    name: inference
    image: docker.io/ultralytics/ultralytics:8.0.180-python
    command: python inference.py
    inputs:
        - name: model
          default: datum://latest-model  # References data from your catalog
          filename: best.onnx            # Renames the file when downloaded
        - name: images
          default: https://ultralytics.com/images/bus.jpg
```

#### Input Configuration Options

* **name**: Identifier for accessing files at `/valohai/inputs/{name}/`
* **default**: Pre-filled source (URL, S3 path, or `datum://` reference)
* **filename**: Rename downloaded files for consistent access

> 💡 *Use `datum://` to reference any file in your project's data catalog, including outputs from previous executions.*

### Access Inputs in Your Code

All inputs download to `/valohai/inputs/{input-name}/` before your code runs:

```python
from ultralytics import YOLO
import os

# Inputs are always at predictable paths
path_to_model = "/valohai/inputs/model/best.onnx"
path_to_images = "/valohai/inputs/images/"

model = YOLO(path_to_model)

# Process all files in the images input
for image in os.listdir(path_to_images):
    image_path = os.path.join(path_to_images, image)

    if os.path.isfile(image_path):
        results = model.predict(image_path, save=True, project="/valohai/outputs", name="predictions")

        for r in results:
            print(r.boxes)
```

### Run with Inputs

Execute your step and Valohai handles the downloads:

```shell
vh execution run inference --adhoc --open-browser
```

On the execution page, the **Inputs** section shows:

* Source locations (URLs, S3 paths, or datum links)
* File previews for supported formats
* Download status and file sizes

### Override Inputs at Runtime

#### From the UI

1. Copy any execution using the "Copy" button
2. Modify inputs in the form:
   * Add URLs directly
   * Browse and select from your data catalog
   * Reference outputs from other executions
3. Create the new execution

#### From the CLI

Override default inputs without modifying `valohai.yaml`:

```shell
# Single file
vh execution run inference --adhoc --model=s3://my-bucket/model.onnx

# Multiple files for one input
vh execution run inference --adhoc --images=https://example.com/img1.jpg --images=https://example.com/img2.jpg
```

### Input Sources

Valohai accepts inputs from:

* **Direct URLs**: Any publicly accessible HTTP/HTTPS endpoint
* **Cloud storage**: S3, Azure Blob, GCS (with configured credentials)
* **Data catalog**: Files uploaded to your project or outputs from executions
* **Version control**: Files committed to your repository

#### Connecting Executions

Use outputs from one job as inputs to another:

1. Train a model → saves to `/valohai/outputs/`
2. Reference it with `datum://` in your next step
3. Valohai tracks the lineage automatically

This creates traceable pipelines where you can see exactly which model version produced which predictions.

#### 💬 Best Practices

* Use descriptive input names that indicate the expected file type
* Set sensible defaults in `valohai.yaml` for common use cases
* Override inputs at runtime for experimentation
* Check file existence before processing to handle optional inputs gracefully
