There are two distinct methods for conducting inference on Valohai, each catering to specific use cases:
Valohai Deployments: This option involves pushing new deployment versions to your Kubernetes cluster. Under this approach, you are responsible for crafting your own RESTful APIs, using frameworks like FastAPI or Flask, and configuring Kubernetes cluster node-groups and scaling rules.
Valohai Executions for Inference: This method empowers you to define an inference job through a standard Valohai execution. Utilizing the Valohai APIs, you can effortlessly initiate a new inference job, specifying the required data and model file(s).
Comparing Valohai deployment options
When comparing Valohai deployment options, it’s crucial to align your requirements and preferences with the suitable approach:
Requirement | Valohai Deployments | Valohai Executions for Inference |
---|---|---|
Latency & Inference Time | Offers low latency for predictions within (sub)seconds. | Predictions can take longer, typically in minutes. |
API | Requires you to create your own RESTful APIs using frameworks like FastAPI or Flask. | Uses Valohai’s RESTful APIs to launch predictions with your data and model(s). |
Versioning | Supports versioning, keeping track of different code and model file versions through deployment versions. | Supports versioning with execution versioning, tracking code and model file versions. |
Metrics | Allows custom log collection from endpoints for visualization with Valohai. | Enables custom log collection for visualization through Valohai. |
Aliases | Provides aliasing functionality for friendly endpoint names. | Provides versioning for model files and Datum Aliases to manage exact model versions. |
Configuration & Management | You are responsible for setting up and managing the Kubernetes cluster, including node groups, access control, and scaling rules. | Valohai handles Virtual Machine management, allowing for scalability. |
What You Need to Provide | Inference code, your RESTful APIs, a base Docker image, model file(s), and resource requirements (CPU, Memory). | Inference code, a base Docker image, model file(s), and Virtual Machine type (CPU/GPU, Memory). |