Deploy Real-Time Endpoint
Deploy a machine learning model as a RESTful API endpoint on Kubernetes for low-latency predictions.
Important: Valohai only creates endpoints from Git-versioned code. Commit and push your code before creating a deployment.
Create the inference code
Build a FastAPI endpoint that loads your model and handles predictions.
Create predict.py:
from fastapi import FastAPI, File, UploadFile
import tensorflow as tf
import numpy
from PIL import Image
from io import BytesIO
app = FastAPI()
model_path = 'model.h5'
loaded_model = None
@app.post("{full_path:path}")
async def predict(image: UploadFile = File(...)):
img = Image.open(BytesIO(await image.read()))
# Preprocess for MNIST
img = img.resize((28, 28)).convert('L')
img_array = numpy.array(img)
image_data = numpy.reshape(img_array, (1, 28, 28))
# Load model once, reuse for subsequent requests
global loaded_model
if not loaded_model:
loaded_model = tf.keras.models.load_model(model_path)
# Run prediction
prediction = loaded_model.predict_classes(image_data)
return f'Predicted_Digit: {prediction[0]}'Why this pattern:
Loading the model once improves response time for subsequent requests
FastAPI handles async requests efficiently
The catch-all path
{full_path:path}works with Valohai's URL routing
Need a model? Download a pre-trained MNIST model: model.h5
Test locally
Validate your endpoint before deploying:
pip install tensorflow==2.5.1 fastapi Pillow python-multipart
uvicorn --debug --reload predict:appVisit http://localhost:8000/docs to see FastAPI's interactive documentation.
Define the endpoint
Add this to your valohai.yaml:
- endpoint:
name: digits
description: predict digits from image inputs
image: tiangolo/uvicorn-gunicorn-fastapi:python3.7
server-command: uvicorn predict:app --host 0.0.0.0 --port 8000
files:
- name: model
description: Model output file from TensorFlow
path: model.h5Configuration explained:
image- Base Docker environment (includes FastAPI + Uvicorn)server-command- How to start your HTTP serverfiles- Model artifacts to include (path is where they'll be saved)
Using multiple model files: You can include multiple models in one endpoint:
files:
- name: classifier
path: classifier.pkl
- name: regressor
path: regressor.pklBoth files become available in your endpoint container at the specified paths.
Installing additional packages
If your base image doesn't include all dependencies, create requirements-deployment.txt:
valohai-utils
tensorflow==2.5.1
Pillow
python-multipartValohai installs these when building your deployment.
If Uvicorn isn't in your base image: Update server-command to use the installed path:
server-command: ~/.local/bin/uvicorn predict:app --host 0.0.0.0 --port 8000Scripts in subfolders: Use Python module syntax, not file paths. For a script at
myfolder/predict.py, usemyfolder.predict:appinstead ofmyfolder/predict:app.
Push to Git
Commit your deployment code:
git add valohai.yaml predict.py requirements-deployment.txt
git commit -m "Add digit prediction endpoint"
git pushCreate the deployment
Open your project in Valohai
Click Fetch repository to pull your latest commit
Navigate to the Deployment tab
Click Create deployment
Name your deployment and select your deployment target (default: Valohai.Cloud)
Click Create version
Select the
digitsendpointChoose a
model.h5file from your previous training runsClick Create version
Deployment status: Watch the build progress. When it shows "100% - Available", your endpoint is ready.
Test your endpoint
Verify it works from the Valohai UI:
Open your deployment
Click Test deployment
Select your
digitsendpointAdd a field named
imagewith type FileUpload a test image
Click Send request
Example test image:
You should see a response like: Predicted_Digit: 7
Next steps:
Monitor your endpoint with custom metrics
Set up aliases for production routing
Handle route prefixes for path-based routing
Last updated
Was this helpful?
