Inference & Serving

Valohai offers two distinct approaches for running inference, each designed for different use cases and latency requirements.

Run inference jobs on datasets using Valohai's standard execution system.

Use this when:

What you'll build:

Where to start: Run Batch Inference

Deploy models as RESTful APIs on Kubernetes for low-latency predictions.

Use this when:

What you'll build:

Where to start: Deploy a Real-Time Endpoint

Factor

Real-Time Endpoints

Batch Executions

Latency

Milliseconds to seconds

Minutes to hours

Infrastructure

Kubernetes cluster (you configure)

Valohai-managed VMs

API

You build RESTful APIs

Valohai REST API triggers jobs

Scaling

Kubernetes auto-scaling

VM-based execution queues

Code Requirements

Web framework (FastAPI, Flask)

Standard Python scripts

Ask yourself: "Does my user need an answer right now?"

Still unsure? Check out our video overview or reach out to your Valohai representative.

Last updated 1 month ago

Was this helpful?