Batch Inference with CSV Data

In this tutorial you will learn how to create and run a Batch Inference execution in Valohai. This execution will use TensorFlow 2.5.0 to run new CSV data through a previously trained model.

Prerequirements

For this tutorial you will need:

  • Python 3.6 or newer

  • Valohai command-line client (Run pip install --upgrade valohai-cli)

We’re also going to need two files:

  • a model trained with TensorFlow 2.5.0

  • some new data in a single CSV file

To make it easy for you they are available here, no need to download them:

If you want to, you can train the required model by following the Keras example here: https://keras.io/examples/structured_data/structured_data_classification_from_scratch/.

Running on Valohai

To easily run our batch inference on Valohai, we will use it to run our code from the very beginning.

If you don’t already have a Valohai account, go to https://app.valohai.com/ to create one for yourself.

Create a new folder for our project, then run the following commands in the project folder:

vh login
# fill in your username
# and your password

vh init
# Answer the wizard questions like this:
# "First, let's..." -> y
# "Looks like..." -> python batch_inference.py, then y to confirm
# "Choose a number or..." -> tensorflow/tensorflow:2.5.0, then y to confirm
# "Write this to..." -> y
# "Do you want to link..." -> C, then give a name for your project, then select your user

Edit the generated valohai.yaml so that it looks like this:

---

- step:
    name: Batch Inference
    image: tensorflow/tensorflow:2.5.0
    command:
    - pip install pandas valohai-utils
    - python batch_inference.py
    inputs:
    - name: model
      default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zip
    - name: images
      default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csv

What we are doing here is defining a single step for our machine learning pipeline, which is the Batch Inference step. We will run on top of the official tensorflow/tensorflow:2.5.0 Docker image, first install the valohai-utils Python library and then run our batch inference code.

Let’s test that everything is set up correctly by running on Valohai:

vh exec run --adhoc "Batch Inference"

If everything went as planned, we should see our Valohai execution end after finding out that batch_inference.py is missing:

Error, but success!

Unpacking the Model

Today we are unpacking the model ourselves. Let’s get started by creating and opening up batch_inference.py in your favorite editor!

Add these imports to the beginning of the file:

import json
from zipfile import ZipFile

import pandas as pd
import tensorflow as tf
import valohai as vh

For unpacking the model, we will only need zipfile and valohai, but we will use the rest of the imports soon enough.

Next, unpack the model to a folder called model in the current working directory:

with ZipFile(vh.inputs('model').path(process_archives=False), 'r') as f:
    f.extractall('model')

Done!

Loading and Using Our Model

Begin by loading our model:

model = tf.keras.models.load_model('model')

Easy, huh? Let’s load up the data:

csv = pd.read_csv('data.csv')
labels = csv.pop('target')
data = tf.data.Dataset.from_tensor_slices((dict(csv), labels))
batch_data = data.batch(batch_size=32)

Aaand we are almost done. Run the model with the loaded up data. While we’re at it, let’s log and save the results as a JSON file:

results = model.predict(batch_data)

# Let's build a dictionary out of the results,
# e.g. {"1": 0.375, "2": 0.76}
flattened_results = results.flatten()
indexed_results = enumerate(flattened_results, start=1)
metadata = dict(indexed_results)

for value in metadata.values():
    with vh.logger() as logger:
        logger.log("result", value)

with open(vh.outputs().path('results.json'), 'w') as f:
    # The JSON library doesn't know how to print
    # NumPy float32 values, so we stringify them
    json.dump(metadata, f, default=lambda v: str(v))

Let’s run the batch inference on Valohai:

vh exec run --adhoc "Batch Inference"

If everything went according to plan, you can now preview the results in the Outputs tab:

Results of our batch inference execution