CSV Inference Example

Process CSV data with a pre-trained model using Valohai's execution system. This example uses TensorFlow 2.5.0 to run predictions on tabular data.

Batch inference runs as a standard Valohai execution, which means you can schedule it, trigger it via API, or chain it in pipelines.

What you'll need

Two files are available in public storage:

  • Model: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zip

  • Data: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csv

You don't need to download these manually. Valohai will fetch them when the job runs.

Inference code

This script loads a zipped model, processes CSV data, and outputs predictions as JSON metadata and a results file.

import json
from zipfile import ZipFile

import pandas as pd
import tensorflow as tf

# Extract and load the model from Valohai inputs
with ZipFile('/valohai/inputs/model/model.zip', 'r') as f:
    f.extractall()

model = tf.keras.models.load_model('model')

# Load CSV data from Valohai inputs
csv = pd.read_csv('/valohai/inputs/data/data.csv')
labels = csv.pop('target')
data = tf.data.Dataset.from_tensor_slices((dict(csv), labels))
batch_data = data.batch(batch_size=32)

# Run predictions
results = model.predict(batch_data)
  
# Build results dictionary: {"1": 0.375, "2": 0.76}
flattened_results = results.flatten()
indexed_results = enumerate(flattened_results, start=1)
metadata = dict(indexed_results)

# Print each result as Valohai metadata for tracking
for value in metadata.values():
    print(json.dumps({"result": str(value)}))

# Save results to Valohai outputs
with open('/valohai/outputs/results.json', 'w') as f:
    # NumPy float32 values need stringification for JSON
    json.dump(metadata, f, default=lambda v: str(v))

Key Valohai paths:

  • /valohai/inputs/model/ - Where input files land

  • /valohai/outputs/ - Where output files are saved and versioned

Define the step

Add this to your valohai.yaml:

- step:
    name: csv-inference
    image: tensorflow/tensorflow:2.5.0
    command:
      - pip install pandas
      - python batch_inference_csv.py
    inputs:
      - name: model
        default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zip
      - name: data
        default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csv

Why these inputs work:

  • default provides a starting point, but you can override with any S3, GCS, or Azure URL

  • Input names (model, data) map to /valohai/inputs/{name}/ directories

Run the inference

Execute from your terminal:

vh execution run csv-inference --adhoc --watch

This streams logs to your terminal in real-time.

Check your results

Once complete, find your outputs in the Outputs tab of the execution. You'll see:

  • results.json with all predictions

  • Execution metadata showing individual prediction values

Next: Learn how to schedule recurring inference runs or trigger via API.

Last updated

Was this helpful?