Deploy Real-Time Endpoint

Deploy a machine learning model as a RESTful API endpoint on Kubernetes for low-latency predictions.

Important: Valohai only creates endpoints from Git-versioned code. Commit and push your code before creating a deployment.

Create the inference code

Build a FastAPI endpoint that loads your model and handles predictions.

Create predict.py:

from fastapi import FastAPI, File, UploadFile
import tensorflow as tf
import numpy
from PIL import Image
from io import BytesIO
 
app = FastAPI()
 
model_path = 'model.h5'
loaded_model = None
 
@app.post("{full_path:path}")
async def predict(image: UploadFile = File(...)):
    img = Image.open(BytesIO(await image.read()))
 
    # Preprocess for MNIST
    img = img.resize((28, 28)).convert('L')
    img_array = numpy.array(img)
    image_data = numpy.reshape(img_array, (1, 28, 28))
 
    # Load model once, reuse for subsequent requests
    global loaded_model
    if not loaded_model:
        loaded_model = tf.keras.models.load_model(model_path)
 
    # Run prediction
    prediction = loaded_model.predict_classes(image_data)
 
    return f'Predicted_Digit: {prediction[0]}'

Why this pattern:

  • Loading the model once improves response time for subsequent requests

  • FastAPI handles async requests efficiently

  • The catch-all path {full_path:path} works with Valohai's URL routing

Need a model? Download a pre-trained MNIST model: model.h5

Test locally

Validate your endpoint before deploying:

pip install tensorflow==2.5.1 fastapi Pillow python-multipart

uvicorn --debug --reload predict:app

Visit http://localhost:8000/docs to see FastAPI's interactive documentation.

Define the endpoint

Add this to your valohai.yaml:

- endpoint:
    name: digits
    description: predict digits from image inputs
    image: tiangolo/uvicorn-gunicorn-fastapi:python3.7
    server-command: uvicorn predict:app --host 0.0.0.0 --port 8000
    files:
      - name: model
        description: Model output file from TensorFlow
        path: model.h5

Configuration explained:

  • image - Base Docker environment (includes FastAPI + Uvicorn)

  • server-command - How to start your HTTP server

  • files - Model artifacts to include (path is where they'll be saved)

Using multiple model files: You can include multiple models in one endpoint:

files:
  - name: classifier
    path: classifier.pkl
  - name: regressor
    path: regressor.pkl

Both files become available in your endpoint container at the specified paths.

Installing additional packages

If your base image doesn't include all dependencies, create requirements-deployment.txt:

valohai-utils
tensorflow==2.5.1
Pillow
python-multipart

Valohai installs these when building your deployment.

If Uvicorn isn't in your base image: Update server-command to use the installed path:

server-command: ~/.local/bin/uvicorn predict:app --host 0.0.0.0 --port 8000

Scripts in subfolders: Use Python module syntax, not file paths. For a script at myfolder/predict.py, use myfolder.predict:app instead of myfolder/predict:app.

Push to Git

Commit your deployment code:

git add valohai.yaml predict.py requirements-deployment.txt
git commit -m "Add digit prediction endpoint"
git push

Create the deployment

  1. Open your project in Valohai

  2. Click Fetch repository to pull your latest commit

  3. Navigate to the Deployment tab

  4. Click Create deployment

  5. Name your deployment and select your deployment target (default: Valohai.Cloud)

  6. Click Create version

  7. Select the digits endpoint

  8. Choose a model.h5 file from your previous training runs

  9. Click Create version

Deployment status: Watch the build progress. When it shows "100% - Available", your endpoint is ready.

Test your endpoint

Verify it works from the Valohai UI:

  1. Open your deployment

  2. Click Test deployment

  3. Select your digits endpoint

  4. Add a field named image with type File

  5. Upload a test image

  6. Click Send request

Example test image:

You should see a response like: Predicted_Digit: 7


Next steps:

Last updated

Was this helpful?