# CSV Inference Example

Process CSV data with a pre-trained model using Valohai's execution system. This example uses TensorFlow 2.5.0 to run predictions on tabular data.

Batch inference runs as a standard Valohai execution, which means you can schedule it, trigger it via API, or chain it in pipelines.

### What you'll need

Two files are available in public storage:

* **Model:** `s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zip`
* **Data:** `s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csv`

You don't need to download these manually. Valohai will fetch them when the job runs.

### Inference code

This script loads a zipped model, processes CSV data, and outputs predictions as JSON metadata and a results file.

```python
import json
from zipfile import ZipFile

import pandas as pd
import tensorflow as tf

# Extract and load the model from Valohai inputs
with ZipFile("/valohai/inputs/model/model.zip", "r") as f:
    f.extractall()

model = tf.keras.models.load_model("model")

# Load CSV data from Valohai inputs
csv = pd.read_csv("/valohai/inputs/data/data.csv")
labels = csv.pop("target")
data = tf.data.Dataset.from_tensor_slices((dict(csv), labels))
batch_data = data.batch(batch_size=32)

# Run predictions
results = model.predict(batch_data)

# Build results dictionary: {"1": 0.375, "2": 0.76}
flattened_results = results.flatten()
indexed_results = enumerate(flattened_results, start=1)
metadata = dict(indexed_results)

# Print each result as Valohai metadata for tracking
for value in metadata.values():
    print(json.dumps({"result": str(value)}))

# Save results to Valohai outputs
with open("/valohai/outputs/results.json", "w") as f:
    # NumPy float32 values need stringification for JSON
    json.dump(metadata, f, default=lambda v: str(v))
```

**Key Valohai paths:**

* `/valohai/inputs/model/` - Where input files land
* `/valohai/outputs/` - Where output files are saved and versioned

### Define the step

Add this to your `valohai.yaml`:

```yaml
- step:
    name: csv-inference
    image: tensorflow/tensorflow:2.5.0
    command:
      - pip install pandas
      - python batch_inference_csv.py
    inputs:
      - name: model
        default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zip
      - name: data
        default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csv
```

**Why these inputs work:**

* `default` provides a starting point, but you can override with any S3, GCS, or Azure URL
* Input names (`model`, `data`) map to `/valohai/inputs/{name}/` directories

### Run the inference

Execute from your terminal:

```shell
vh execution run csv-inference --adhoc --watch
```

This streams logs to your terminal in real-time.

### Check your results

Once complete, find your outputs in the Outputs tab of the execution. You'll see:

* `results.json` with all predictions
* Execution metadata showing individual prediction values

**Next:** Learn how to [schedule recurring inference runs](https://github.com/valohai/dokuhai/blob/main/pipelines/trigger-pipeline.md) or [trigger via API](https://github.com/valohai/dokuhai/blob/main/api/trigger-execution.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/serving-your-models/deploy-batch/csv-inference.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
