Batch Inference

Batch inference in Valohai runs as a standard execution, letting you process datasets or file collections at scale without managing infrastructure.

How it works

Batch inference uses the same execution system you use for training:

  1. Define a step in valohai.yaml with your inference code

  2. Specify inputs (model files and data to process)

  3. Run the execution via CLI, API, or schedule it

  4. Collect results from outputs

Key advantage: You already know this system. If you've run training jobs, you can run inference jobs.

What you can do

  • Process thousands of images, CSVs, or other file types

  • Schedule recurring inference jobs (e.g., nightly predictions)

  • Trigger inference via API when new data arrives

  • Chain inference into pipelines after training completes

  • Track inference metrics alongside training metrics

Example use cases

Image classification at scale Process a directory of product images to tag inventory items.

Batch predictions on tabular data Run monthly churn predictions on your entire customer database.

Document processing Extract entities from legal documents or medical records in batches.

When to use batch inference

Choose batch inference when:

  • You're processing datasets, not individual requests

  • Latency requirements are in minutes or hours, not milliseconds

  • You want to leverage Valohai's execution tracking and versioning

  • You need to schedule or automate inference runs

Need lower latency? Check out Real-Time Endpoints for sub-second predictions.

Next steps

See practical examples:

Or jump straight to defining your inference step in valohai.yaml.

Last updated

Was this helpful?