See also

For the technical specifications, go to valohai.yaml endpoint section.

A deployment is a group of versioned web endpoints ran on a Kubernetes cluster for online inference.

After you’ve specified how your model is served using your project and the valohai.yaml endpoint definitions, you can create and manage deployments under Deployment tab on Valohai web interface.

Batch inference

Valohai deployments are recommended if you want to get low latency predictions on per-sample basis.

If you only need non-interactive batch predictions (e.g. taking a lot of samples as input and writing predictions into a file), you can simply create a Valohai step to handle that, take your model/samples as inputs and write your predictions to /valohai/outputs to be uploaded.

Main upside of the batch inference approach is cost; there are no servers constantly running. And the main downside is latency as the worker must be started when predictions are requested. So if your predictions are not time sensitive and each group of predictions can take 10 minutes or so, batch inference is the way to go.

To learn more about batch inference:

If you want to dive straight into deploying your first HTTP endpoint, check out your tutorial how to deploy a model for online inference

Deployment targets

Each deployment has a deployment target, which is a Kubernetes cluster that the service will be served on. The default deployment target is a shared Kubernetes cluster managed by Valohai but you can also use your own cluster.

You would use multiple deployment targets if you wish to run your service in different geolocations.

Reach out to your Valohai contact if you wish to setup additional deployment targets.

Each deployment will be assigned an address in the format:
… which translates to the following on the shared Kubernetes cluster:<owner>/<project>/<deployment>/

Deployment versions

A deployment version is a Docker image that Valohai builds on top of the Docker image you specify in the endpoint YAML definition. The build image will include 1) your code repository and 2) all files you defined in the YAML file and specified during deployment version creation.

The deployment version is the actual artifact that is served on the target Kubernetes cluster.

Each deployment can have multiple versions at the same time.

Running deployment versions will be accessible through:<owner>/<project>/<deployment>/<version>

Deployment endpoints

A deployment endpoint is one or more Docker containers running HTTP servers in an auto-scaling Kubernetes cluster. You define endpoints in the valohai.yaml

You can have multiple endpoints per deployment version because a single project can have various inference needs for different contexts.

Each endpoint will get a separate URL:<owner>/<project>/<deployment>/<version>/<endpoint>

Deployment aliases

A deployment alias is a name, like staging or production, that points to a deployment version.

Aliases create canonical URLs so you can use Valohai to control which version is being served in each context. This allows you to update currently used version or rollback to previous version if something goes wrong. Changing alias routing is instantaneous.

For example, alias could be used by applications utilizing your predictions and they don’t need to change the URL when you a release new endpoint version.