# Deploy Real-Time Endpoint

Deploy a machine learning model as a RESTful API endpoint on Kubernetes for low-latency predictions.

> **Important:** Valohai only creates endpoints from Git-versioned code. Commit and push your code before creating a deployment.

### Create the inference code

Build a FastAPI endpoint that loads your model and handles predictions.

Create `predict.py`:

```python
from fastapi import FastAPI, File, UploadFile
import tensorflow as tf
import numpy
from PIL import Image
from io import BytesIO

app = FastAPI()

model_path = "model.h5"
loaded_model = None


@app.post("{full_path:path}")
async def predict(image: UploadFile = File(...)):
    img = Image.open(BytesIO(await image.read()))

    # Preprocess for MNIST
    img = img.resize((28, 28)).convert("L")
    img_array = numpy.array(img)
    image_data = numpy.reshape(img_array, (1, 28, 28))

    # Load model once, reuse for subsequent requests
    global loaded_model
    if not loaded_model:
        loaded_model = tf.keras.models.load_model(model_path)

    # Run prediction
    prediction = loaded_model.predict_classes(image_data)

    return f"Predicted_Digit: {prediction[0]}"
```

**Why this pattern:**

* Loading the model once improves response time for subsequent requests
* FastAPI handles async requests efficiently
* The catch-all path `{full_path:path}` works with Valohai's URL routing

> **Need a model?** Download a pre-trained MNIST model: [model.h5](https://valohai-public-files.s3.eu-west-1.amazonaws.com/tutorials/inference/model.h5)

### Test locally

Validate your endpoint before deploying:

```shell
pip install tensorflow==2.5.1 fastapi Pillow python-multipart

uvicorn --debug --reload predict:app
```

Visit `http://localhost:8000/docs` to see FastAPI's interactive documentation.

### Define the endpoint

Add this to your `valohai.yaml`:

```yaml
- endpoint:
    name: digits
    description: predict digits from image inputs
    image: tiangolo/uvicorn-gunicorn-fastapi:python3.7
    server-command: uvicorn predict:app --host 0.0.0.0 --port 8000
    files:
      - name: model
        description: Model output file from TensorFlow
        path: model.h5
```

**Configuration explained:**

* `image` - Base Docker environment (includes FastAPI + Uvicorn)
* `server-command` - How to start your HTTP server
* `files` - Model artifacts to include (path is where they'll be saved)

**Using multiple model files:** You can include multiple models in one endpoint:

```yaml
files:
  - name: classifier
    path: classifier.pkl
  - name: regressor
    path: regressor.pkl
```

Both files become available in your endpoint container at the specified paths.

#### Installing additional packages

If your base image doesn't include all dependencies, create `requirements-deployment.txt`:

```
valohai-utils
tensorflow==2.5.1
Pillow
python-multipart
```

Valohai installs these when building your deployment.

**If Uvicorn isn't in your base image:** Update `server-command` to use the installed path:

```yaml
server-command: ~/.local/bin/uvicorn predict:app --host 0.0.0.0 --port 8000
```

> **Scripts in subfolders:** Use Python module syntax, not file paths. For a script at `myfolder/predict.py`, use `myfolder.predict:app` instead of `myfolder/predict:app`.

### Push to Git

Commit your deployment code:

```shell
git add valohai.yaml predict.py requirements-deployment.txt
git commit -m "Add digit prediction endpoint"
git push
```

### Create the deployment

1. Open your project in Valohai
2. Click **Fetch repository** to pull your latest commit
3. Navigate to the **Deployment** tab
4. Click **Create deployment**
5. Name your deployment and select your deployment target (default: Valohai.Cloud)
6. Click **Create version**
7. Select the `digits` endpoint
8. Choose a `model.h5` file from your previous training runs
9. Click **Create version**

**Deployment status:** Watch the build progress. When it shows "100% - Available", your endpoint is ready.

### Test your endpoint

Verify it works from the Valohai UI:

1. Open your deployment
2. Click **Test deployment**
3. Select your `digits` endpoint
4. Add a field named `image` with type **File**
5. Upload a test image
6. Click **Send request**

**Example test image:**

You should see a response like: `Predicted_Digit: 7`

***

**Next steps:**

* [Monitor your endpoint](https://github.com/valohai/dokuhai/blob/main/docs/monitor.md) with custom metrics
* [Set up aliases](https://github.com/valohai/dokuhai/blob/main/docs/concepts.md#deployment-alias) for production routing
* [Handle route prefixes](https://github.com/valohai/dokuhai/blob/main/docs/deployment-prefix.md) for path-based routing


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/serving-your-models/real-time-endpoints/deploy-real-time.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
