Deploy Real-Time Endpoint

Deploy a machine learning model as a RESTful API endpoint on Kubernetes for low-latency predictions.

Important: Valohai only creates endpoints from Git-versioned code. Commit and push your code before creating a deployment.

Create the inference code

Build a FastAPI endpoint that loads your model and handles predictions.

Create predict.py:

from fastapi import FastAPI, File, UploadFile
import tensorflow as tf
import numpy
from PIL import Image
from io import BytesIO
 
app = FastAPI()
 
model_path = 'model.h5'
loaded_model = None
 
@app.post("{full_path:path}")
async def predict(image: UploadFile = File(...)):
    img = Image.open(BytesIO(await image.read()))
 
    # Preprocess for MNIST
    img = img.resize((28, 28)).convert('L')
    img_array = numpy.array(img)
    image_data = numpy.reshape(img_array, (1, 28, 28))
 
    # Load model once, reuse for subsequent requests
    global loaded_model
    if not loaded_model:
        loaded_model = tf.keras.models.load_model(model_path)
 
    # Run prediction
    prediction = loaded_model.predict_classes(image_data)
 
    return f'Predicted_Digit: {prediction[0]}'

Why this pattern:

Loading the model once improves response time for subsequent requests
FastAPI handles async requests efficiently
The catch-all path {full_path:path} works with Valohai's URL routing

Need a model? Download a pre-trained MNIST model: model.h5

Test locally

Validate your endpoint before deploying:

pip install tensorflow==2.5.1 fastapi Pillow python-multipart

uvicorn --debug --reload predict:app

Visit http://localhost:8000/docs to see FastAPI's interactive documentation.

Define the endpoint

Add this to your valohai.yaml:

- endpoint:
    name: digits
    description: predict digits from image inputs
    image: tiangolo/uvicorn-gunicorn-fastapi:python3.7
    server-command: uvicorn predict:app --host 0.0.0.0 --port 8000
    files:
      - name: model
        description: Model output file from TensorFlow
        path: model.h5

Configuration explained:

image - Base Docker environment (includes FastAPI + Uvicorn)
server-command - How to start your HTTP server
files - Model artifacts to include (path is where they'll be saved)

Using multiple model files: You can include multiple models in one endpoint:

files:
  - name: classifier
    path: classifier.pkl
  - name: regressor
    path: regressor.pkl

Both files become available in your endpoint container at the specified paths.

Installing additional packages

If your base image doesn't include all dependencies, create requirements-deployment.txt:

valohai-utils
tensorflow==2.5.1
Pillow
python-multipart

Valohai installs these when building your deployment.

If Uvicorn isn't in your base image: Update server-command to use the installed path:

server-command: ~/.local/bin/uvicorn predict:app --host 0.0.0.0 --port 8000

Scripts in subfolders: Use Python module syntax, not file paths. For a script at myfolder/predict.py, use myfolder.predict:app instead of myfolder/predict:app.

Push to Git

Commit your deployment code:

git add valohai.yaml predict.py requirements-deployment.txt
git commit -m "Add digit prediction endpoint"
git push

Create the deployment

Open your project in Valohai
Click Fetch repository to pull your latest commit
Navigate to the Deployment tab
Click Create deployment
Name your deployment and select your deployment target (default: Valohai.Cloud)
Click Create version
Select the digits endpoint
Choose a model.h5 file from your previous training runs
Click Create version

Deployment status: Watch the build progress. When it shows "100% - Available", your endpoint is ready.

Test your endpoint

Verify it works from the Valohai UI:

Open your deployment
Click Test deployment
Select your digits endpoint
Add a field named image with type File
Upload a test image
Click Send request

Example test image:

You should see a response like: Predicted_Digit: 7

Next steps:

Monitor your endpoint with custom metrics
Set up aliases for production routing
Handle route prefixes for path-based routing

PreviousReal-Time Endpoints NextTest Endpoints

Last updated 5 hours ago

Was this helpful?