# Inference & Serving

Valohai offers two distinct approaches for running inference, each designed for different use cases and latency requirements.

### Which path fits your needs?

#### Batch Inference via Executions

Run inference jobs on datasets using Valohai's standard execution system.

**Use this when:**

* You're processing large datasets or file batches
* Predictions can take minutes or hours
* You need to run scheduled or triggered inference jobs

**What you'll build:**

* Python inference scripts
* Standard Valohai executions with inputs and outputs
* Scheduled or API-triggered batch jobs

**Where to start:** [Run Batch Inference](/serving-your-models/deploy-batch.md)

***

#### Real-Time Endpoints

Deploy models as RESTful APIs on Kubernetes for low-latency predictions.

**Use this when:**

* You need predictions in milliseconds or seconds
* Your application requires synchronous responses
* You're building user-facing features or interactive systems

**What you'll build:**

* FastAPI or Flask inference servers
* Auto-scaling Kubernetes endpoints
* Version-controlled deployment aliases

**Where to start:** [Deploy a Real-Time Endpoint](/serving-your-models/real-time-endpoints.md)

***

### Key Differences

| Factor                | Real-Time Endpoints                | Batch Executions               |
| --------------------- | ---------------------------------- | ------------------------------ |
| **Latency**           | Milliseconds to seconds            | Minutes to hours               |
| **Infrastructure**    | Kubernetes cluster (you configure) | Valohai-managed VMs            |
| **API**               | You build RESTful APIs             | Valohai REST API triggers jobs |
| **Scaling**           | Kubernetes auto-scaling            | VM-based execution queues      |
| **Code Requirements** | Web framework (FastAPI, Flask)     | Standard Python scripts        |

***

### Need Help Deciding?

**Ask yourself:** "Does my user need an answer right now?"

* **Yes** → Real-Time Endpoints
* **No** → Batch Executions

Still unsure? Check out our [video overview](https://www.loom.com/embed/488d391bb146463f91109743ad429ca6) or reach out to your Valohai representative.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/serving-your-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
