Batch Inference with Image Data

In this tutorial you will learn how to create and run a Batch Inference execution in Valohai. This execution will use TensorFlow 2.4.1 to run inference on ten unique images using a pre-trained MNIST model.

Prerequirements

For this tutorial you will need:

  • Python 3.6 or newer

  • Valohai command-line client (Run pip install --upgrade valohai-cli)

We’re also going to need two files:

  • an MNIST model trained with TensorFlow 2.4.1

  • some images to classify, packaged in a single zip file

To make it easy for you they are available here, no need to download them:

If you want to, you can train the required model by following our Valohai 101 tutorial.

Running on Valohai

To easily run our batch inference on Valohai, we will use it to run our code from the very beginning.

If you don’t already have a Valohai account, go to https://app.valohai.com/ to create one for yourself.

Create a new folder for our project, then run the following commands in the project folder:

vh login
# fill in your username
# and your password

vh init
# Answer the wizard questions like this:
# "First, let's..." -> y
# "Looks like..." -> python batch_inference.py, then y to confirm
# "Choose a number or..." -> tensorflow/tensorflow:2.4.1, then y to confirm
# "Write this to..." -> y
# "Do you want to link..." -> C, then give a name for your project, then select your user

Edit the generated valohai.yaml so that it looks like this:

---

- step:
    name: Batch Inference
    image: tensorflow/tensorflow:2.4.1
    command:
    - pip install pillow valohai-utils
    - python batch_inference.py
    inputs:
    - name: model
      default: s3://valohai-public-files/tutorials/batch-inference/image-batch-inference/model.h5
    - name: images
      default: s3://valohai-public-files/tutorials/batch-inference/image-batch-inference/images.zip

What we are doing here is defining a single step for our machine learning pipeline, which is the Batch Inference step. We will run on top of the official tensorflow/tensorflow:2.4.1 Docker image, first install the pillow and valohai-utils Python libraries and then run our batch inference code.

Let’s test that everything is set up correctly by running on Valohai:

vh exec run --adhoc "Batch Inference"

If everything went as planned, we should see our Valohai execution end after finding out that batch_inference.py is missing:

Error, but success!

Unpacking the Images

…actually the Valohai helper library (valohai-utils) unpacks the images automatically, so we can just fetch their location with vh.inputs('images').paths()!

Loading and Using Our Model

Okey, let’s get started by creating and opening up batch_inference.py in your favorite editor!

Add these imports to the beginning of the file:

import json
import os

import numpy as np
from PIL import Image
import tensorflow as tf
import valohai as vh

We are using the valohai-utils library to make integration with Valohai straightforward.

Begin by loading our model:

model = tf.keras.models.load_model(vh.inputs('model').path())

Easy, huh? Let’s define a function to load up an image and apply preprocessing.

As we are using the MNIST dataset, our preprocessing will:

  • Convert our images to grayscale

  • Resize the images to 28x28 pixels

  • Normalize the image levels from 0–255 to 0.0–1.0

Let’s go!

def load_image(image_path):
    image_name = os.path.basename(image_path)
    image = Image.open(image_path)
    image.load()

    image = image.resize((28, 28)).convert('L')
    image_data = np.array(image).reshape(1, 28, 28)
    image_data = image_data / 255.0

    return (image_name, image_data)

Then define a function to run inference on an image:

def run_inference(image):
    image_name, image_data = image
    prediction = np.argmax(model.predict(image_data))

    with vh.logger() as logger:
        logger.log('image', image_name)
        logger.log('inferred_digit', prediction)

    return {
        'image': image_name,
        'inferred_digit': str(prediction),
    }

Finally, let’s run these functions for all the images. While we already log all the individual inference results in the inference function, let’s also save the complete results in a single JSON file.

results = []
for path in vh.inputs('images').paths():
    results.append(run_inference(load_image(path)))

with open(vh.outputs().path('results.json'), 'w') as f:
    json.dump(results, f)

Let’s run the batch inference on Valohai:

vh exec run --adhoc "Batch Inference"

If everything went according to plan, you can now preview the results in the Outputs tab:

Results of our batch inference execution