CSV Inference Example

Process CSV data with a pre-trained model using Valohai's execution system. This example uses TensorFlow 2.5.0 to run predictions on tabular data.

Batch inference runs as a standard Valohai execution, which means you can schedule it, trigger it via API, or chain it in pipelines.

What you'll need

Two files are available in public storage:

  • Model: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zip

  • Data: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csv

You don't need to download these manually. Valohai will fetch them when the job runs.

Inference code

This script loads a zipped model, processes CSV data, and outputs predictions as JSON metadata and a results file.

import json
from zipfile import ZipFile

import pandas as pd
import tensorflow as tf

# Extract and load the model from Valohai inputs
with ZipFile("/valohai/inputs/model/model.zip", "r") as f:
    f.extractall()

model = tf.keras.models.load_model("model")

# Load CSV data from Valohai inputs
csv = pd.read_csv("/valohai/inputs/data/data.csv")
labels = csv.pop("target")
data = tf.data.Dataset.from_tensor_slices((dict(csv), labels))
batch_data = data.batch(batch_size=32)

# Run predictions
results = model.predict(batch_data)

# Build results dictionary: {"1": 0.375, "2": 0.76}
flattened_results = results.flatten()
indexed_results = enumerate(flattened_results, start=1)
metadata = dict(indexed_results)

# Print each result as Valohai metadata for tracking
for value in metadata.values():
    print(json.dumps({"result": str(value)}))

# Save results to Valohai outputs
with open("/valohai/outputs/results.json", "w") as f:
    # NumPy float32 values need stringification for JSON
    json.dump(metadata, f, default=lambda v: str(v))

Key Valohai paths:

  • /valohai/inputs/model/ - Where input files land

  • /valohai/outputs/ - Where output files are saved and versioned

Define the step

Add this to your valohai.yaml:

Why these inputs work:

  • default provides a starting point, but you can override with any S3, GCS, or Azure URL

  • Input names (model, data) map to /valohai/inputs/{name}/ directories

Run the inference

Execute from your terminal:

This streams logs to your terminal in real-time.

Check your results

Once complete, find your outputs in the Outputs tab of the execution. You'll see:

  • results.json with all predictions

  • Execution metadata showing individual prediction values

Next: Learn how to schedule recurring inference runs or trigger via API.

Last updated

Was this helpful?