Installation & Setup
Understand Valohai's architecture and choose the right deployment model for your infrastructure
Valohai is a machine learning platform that runs on your infrastructure.
Valohai's architecture separates the application control plane from compute and data infrastructure. This guide covers the core components and deployment models for platform engineers setting up Valohai.
Deployment Models
Hybrid
Valohai-managed (app.valohai.com)
Your infrastructure
Most teams. Teams who want managed platform updates and quick setup
Self-Hosted
Your infrastructure
Your infrastructure
Air-gapped environments, strict data residency, full control
Hybrid Deployment

The application runs on Valohai's infrastructure while compute and data remain in your environment. This ensures your code, data, and models never leave your control.
How it works:
Users access Valohai at
app.valohai.comValohai communicates with your Redis queue machine to schedule jobs
Workers in your infrastructure pick up and execute jobs
All training data and artifacts stay in your storage
Only high-level job metadata flow to Valohai's database
Requirements:
Compute environment
Cloud account (AWS, Azure, GCP, Scaleway, Oracle)
on-premises infrastructure
(optional, for K8s workers) Kubernetes cluster
Network access from
app.valohai.comto your queue machineObject storage (S3, Azure Blob, GCS, or S3-compatible)
Self-Hosted Deployment
Both application and infrastructure run entirely within your environment. You control updates, access, and all network policies.
How it works:
You host the Valohai application in your infrastructure
Users access your instance at your domain
All components communicate within your network
Valohai provides application updates as Docker images
Requirements:
Infrastructure for the application (bare metal, VM, or Kubernetes)
PostgreSQL database
Redis instance
Object storage
Components
Queue Machine
Manages job scheduling and routing
• Static VM or managed Redis service • 2 vCPUs, 4GB RAM minimum • DNS name • Runs Redis for job queue and short-term logs • TLS certificate for secure communication
Workers
Execute ML workloads
• Pull jobs from queue machine • Download inputs from object storage • Run code in Docker containers • Upload outputs to storage • Scale down when idle
Object Storage
Stores artifacts, models, and logs
• Git repository snapshots • Execution logs • Input datasets • Output artifacts (models, visualizations)
Worker Types
Valohai supports flexible worker infrastructure to match your environment:
Autoscaled VMs
Cloud instances that scale based on demand
AWS, Azure, GCP, Scaleway, Oracle Cloud (OCI)
Kubernetes Pods
Containerized workers in K8s clusters
Cloud or on-premises Kubernetes
Static Machines
Physical or virtual machines
On-premises data centers, dedicated hardware
SLURM Nodes
HPC cluster integration
Existing SLURM-managed compute clusters
Supported Object Storage
AWS S3
Azure Blob Storage
Google Cloud Storage
Oracle Cloud Infrastructure (OCI) Object Storage
S3-compatible storage (MinIO, NetApp, etc.)
Networking & Security
Hybrid Deployment
app.valohai.com (34.248.245.191, 63.34.156.112)
Queue machine
63790
Job scheduling and queue management
Workers
Queue machine
63790
Pull jobs from Redis queue
Workers
app.valohai.com
443
Report status and metadata
Workers
Object storage
443
Download inputs, upload outputs
Workers
Internet
443
Pull Docker images and packages
Queue machine
Internet
80
Let's Encrypt certificate renewal (or use your own certificate)
Security principles:
Code and data never leave your environment
Workers never accept inbound connections from Valohai
Only job metadata and logs are sent to Valohai's database
All communication uses TLS encryption
Self-Hosted Deployment
Users
Valohai app
443
Web interface access
Valohai app
PostgreSQL
5432
Database queries (internal only)
Valohai app
Queue machine
63790
Job scheduling
Workers
Queue machine
63790
Pull jobs from Redis queue
Workers
Valohai app
443
Report status and metadata
Workers
Object storage
443
Download inputs, upload outputs
💡 In self-hosted deployments, all network policies are under your control. The ports listed are defaults and can be customized during setup.
Choose Your Installation Path
I have AWS
Hybrid: AWS Hybrid Deployment
Self-hosted on EC2: AWS Self-Hosted EC2
Self-hosted on EKS: AWS Self-Hosted EKS
I have Azure
Hybrid (Marketplace): Azure Marketplace
Hybrid (Manual): Azure Hybrid Deployment
I have GCP
Hybrid: GCP Hybrid Deployment
I have Kubernetes
Workers on any K8s: Kubernetes Workers
OVH Cloud: OVH Kubernetes
Oracle Cloud: Oracle Kubernetes
I have on-premises servers
Linux workers: On-Premises Installation
I have OpenShift
Self-hosted: OpenShift Self-Hosted
I have SLURM cluster
SLURM integration: SLURM Cluster
What You'll Need
Every installation requires:
From Valohai:
Queue address (e.g.,
yourcompany.vqueue.net)You can also provision your own certificate
Redis password (stored in your secret manager)
For self-hosted: Docker images and license details
From your environment:
Cloud account with appropriate permissions
Object storage bucket
Network connectivity between components
For GPU workloads: GPU quota and drivers
Before you start:
Review your organization's network and security policies
Ensure you have necessary cloud permissions
Decide on resource allocation (VM types, storage size)
Plan your environment separation (dev/prod)
Getting Help
Contact [email protected] if you:
Need help choosing the right deployment model
Have specific security or compliance requirements
Want guidance on resource sizing
Encounter issues during installation
Last updated
Was this helpful?
