Installation & Setup

Understand Valohai's architecture and choose the right deployment model for your infrastructure

Valohai is a machine learning platform that runs on your infrastructure.

Valohai's architecture separates the application control plane from compute and data infrastructure. This guide covers the core components and deployment models for platform engineers setting up Valohai.

Deployment Models

Model

Application Layer

Compute & Data Layer

Best For

Hybrid

Valohai-managed (app.valohai.com)

Your infrastructure

Most teams. Teams who want managed platform updates and quick setup

Self-Hosted

Your infrastructure

Air-gapped environments, strict data residency, full control

Hybrid Deployment

The application runs on Valohai's infrastructure while compute and data remain in your environment. This ensures your code, data, and models never leave your control.

How it works:

Users access Valohai at app.valohai.com
Valohai communicates with your Redis queue machine to schedule jobs
Workers in your infrastructure pick up and execute jobs
All training data and artifacts stay in your storage
Only high-level job metadata flow to Valohai's database

Requirements:

Compute environment
- Cloud account (AWS, Azure, GCP, Scaleway, Oracle)
- on-premises infrastructure
- (optional, for K8s workers) Kubernetes cluster
Network access from app.valohai.com to your queue machine
Object storage (S3, Azure Blob, GCS, or S3-compatible)

Self-Hosted Deployment

Both application and infrastructure run entirely within your environment. You control updates, access, and all network policies.

How it works:

You host the Valohai application in your infrastructure
Users access your instance at your domain
All components communicate within your network
Valohai provides application updates as Docker images

Requirements:

Infrastructure for the application (bare metal, VM, or Kubernetes)
PostgreSQL database
Redis instance
Object storage

Components

Component

Purpose

Specifications

Queue Machine

Manages job scheduling and routing

• Static VM or managed Redis service • 2 vCPUs, 4GB RAM minimum • DNS name • Runs Redis for job queue and short-term logs • TLS certificate for secure communication

Workers

Execute ML workloads

• Pull jobs from queue machine • Download inputs from object storage • Run code in Docker containers • Upload outputs to storage • Scale down when idle

Object Storage

Stores artifacts, models, and logs

• Git repository snapshots • Execution logs • Input datasets • Output artifacts (models, visualizations)

Worker Types

Valohai supports flexible worker infrastructure to match your environment:

Type

Description

Use Case

Autoscaled VMs

Cloud instances that scale based on demand

AWS, Azure, GCP, Scaleway, Oracle Cloud (OCI)

Kubernetes Pods

Containerized workers in K8s clusters

Cloud or on-premises Kubernetes

Static Machines

Physical or virtual machines

On-premises data centers, dedicated hardware

SLURM Nodes

HPC cluster integration

Existing SLURM-managed compute clusters

Supported Object Storage

AWS S3
Azure Blob Storage
Google Cloud Storage
Oracle Cloud Infrastructure (OCI) Object Storage
S3-compatible storage (MinIO, NetApp, etc.)

Networking & Security

Hybrid Deployment

Source

Destination

Port

Purpose

app.valohai.com (34.248.245.191, 63.34.156.112)

Queue machine

63790

Job scheduling and queue management

Workers

Queue machine

63790

Pull jobs from Redis queue

Workers

app.valohai.com

443

Report status and metadata

Workers

Object storage

443

Download inputs, upload outputs

Workers

Internet

443

Pull Docker images and packages

Queue machine

Internet

Let's Encrypt certificate renewal (or use your own certificate)

Security principles:

Code and data never leave your environment
Workers never accept inbound connections from Valohai
Only job metadata and logs are sent to Valohai's database
All communication uses TLS encryption

Self-Hosted Deployment

Source

Destination

Port

Purpose

Users

Valohai app

443

Web interface access

Valohai app

PostgreSQL

5432

Database queries (internal only)

Valohai app

Queue machine

63790

Job scheduling

Workers

Queue machine

63790

Pull jobs from Redis queue

Workers

Valohai app

443

Report status and metadata

Workers

Object storage

443

Download inputs, upload outputs

💡 In self-hosted deployments, all network policies are under your control. The ports listed are defaults and can be customized during setup.

Choose Your Installation Path

I have AWS

Hybrid: AWS Hybrid Deployment
Self-hosted on EC2: AWS Self-Hosted EC2
Self-hosted on EKS: AWS Self-Hosted EKS

I have Azure

Hybrid (Marketplace): Azure Marketplace
Hybrid (Manual): Azure Hybrid Deployment

I have GCP

Hybrid: GCP Hybrid Deployment

I have Kubernetes

Workers on any K8s: Kubernetes Workers
OVH Cloud: OVH Kubernetes
Oracle Cloud: Oracle Kubernetes

I have on-premises servers

Linux workers: On-Premises Installation

I have OpenShift

Self-hosted: OpenShift Self-Hosted

I have SLURM cluster

SLURM integration: SLURM Cluster

What You'll Need

Every installation requires:

From Valohai:

Queue address (e.g., yourcompany.vqueue.net)
- You can also provision your own certificate
Redis password (stored in your secret manager)
For self-hosted: Docker images and license details

From your environment:

Cloud account with appropriate permissions
Object storage bucket
Network connectivity between components
For GPU workloads: GPU quota and drivers

Before you start:

Review your organization's network and security policies
Ensure you have necessary cloud permissions
Decide on resource allocation (VM types, storage size)
Plan your environment separation (dev/prod)

Getting Help

Contact [email protected] if you:

Need help choosing the right deployment model
Have specific security or compliance requirements
Want guidance on resource sizing
Encounter issues during installation

PreviousOkta SAML NextAWS

Last updated 13 days ago

Was this helpful?