Getting Started

Valohai is an MLOps platform that handles infrastructure complexity while you build production ML systems. Train models, run experiments, and deploy to production, all without DevOps overhead.

Core Capabilities

Full experiment tracking and lineage

Every run becomes reproducible and auditable.

Automatic versioning — Code, data, parameters, and environments captured on every run
Metric comparison — Compare runs, spot regressions, track model drift
Dataset versioning — Link datasets to experiments without storage duplication

Infrastructure abstraction

Run ML workloads on any compute with one command.

Multi-cloud execution — AWS, GCP, Azure, Oracle Cloud Infrastructure, Scaleway, OVH, Slurm, Kubernetes, or on-premises hardware
Elastic scaling — Same code runs on 1 GPU or 100 GPUs
Production deployment — Batch inference, REST APIs, or streaming endpoints with built-in monitoring

Framework agnostic

Your code, your tools, zero lock-in.

Any ML framework — PyTorch, TensorFlow, JAX, Hugging Face, or custom stacks
Simple integration — Add a valohai.yaml to any project
API-first design — REST API and webhooks for CI/CD pipelines

Who uses Valohai?

Data Scientists & ML Engineers — Focus on model development instead of cloud configurations MLOps Teams — Standardize workflows across projects without forcing tool changes Enterprise ML Teams — Meet compliance requirements with full audit trails and data lineage