CSV Inference Example
Process CSV data with a pre-trained model using Valohai's execution system. This example uses TensorFlow 2.5.0 to run predictions on tabular data.
Batch inference runs as a standard Valohai execution, which means you can schedule it, trigger it via API, or chain it in pipelines.
What you'll need
Two files are available in public storage:
Model:
s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zipData:
s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csv
You don't need to download these manually. Valohai will fetch them when the job runs.
Inference code
This script loads a zipped model, processes CSV data, and outputs predictions as JSON metadata and a results file.
import json
from zipfile import ZipFile
import pandas as pd
import tensorflow as tf
# Extract and load the model from Valohai inputs
with ZipFile('/valohai/inputs/model/model.zip', 'r') as f:
f.extractall()
model = tf.keras.models.load_model('model')
# Load CSV data from Valohai inputs
csv = pd.read_csv('/valohai/inputs/data/data.csv')
labels = csv.pop('target')
data = tf.data.Dataset.from_tensor_slices((dict(csv), labels))
batch_data = data.batch(batch_size=32)
# Run predictions
results = model.predict(batch_data)
# Build results dictionary: {"1": 0.375, "2": 0.76}
flattened_results = results.flatten()
indexed_results = enumerate(flattened_results, start=1)
metadata = dict(indexed_results)
# Print each result as Valohai metadata for tracking
for value in metadata.values():
print(json.dumps({"result": str(value)}))
# Save results to Valohai outputs
with open('/valohai/outputs/results.json', 'w') as f:
# NumPy float32 values need stringification for JSON
json.dump(metadata, f, default=lambda v: str(v))Key Valohai paths:
/valohai/inputs/model/- Where input files land/valohai/outputs/- Where output files are saved and versioned
Define the step
Add this to your valohai.yaml:
- step:
name: csv-inference
image: tensorflow/tensorflow:2.5.0
command:
- pip install pandas
- python batch_inference_csv.py
inputs:
- name: model
default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zip
- name: data
default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csvWhy these inputs work:
defaultprovides a starting point, but you can override with any S3, GCS, or Azure URLInput names (
model,data) map to/valohai/inputs/{name}/directories
Run the inference
Execute from your terminal:
vh execution run csv-inference --adhoc --watchThis streams logs to your terminal in real-time.
Check your results
Once complete, find your outputs in the Outputs tab of the execution. You'll see:
results.jsonwith all predictionsExecution metadata showing individual prediction values
Next: Learn how to schedule recurring inference runs or trigger via API.
Last updated
Was this helpful?
