Installation & Setup

Understand Valohai's architecture and choose the right deployment model for your infrastructure

Valohai is a machine learning platform that runs on your infrastructure.

Valohai's architecture separates the application control plane from compute and data infrastructure. This guide covers the core components and deployment models for platform engineers setting up Valohai.

Deployment Models

Model
Application Layer
Compute & Data Layer
Best For

Hybrid

Valohai-managed (app.valohai.com)

Your infrastructure

Most teams. Teams who want managed platform updates and quick setup

Self-Hosted

Your infrastructure

Your infrastructure

Air-gapped environments, strict data residency, full control

Hybrid Deployment

The application runs on Valohai's infrastructure while compute and data remain in your environment. This ensures your code, data, and models never leave your control.

How it works:

  • Users access Valohai at app.valohai.com

  • Valohai communicates with your Redis queue machine to schedule jobs

  • Workers in your infrastructure pick up and execute jobs

  • All training data and artifacts stay in your storage

  • Only high-level job metadata flow to Valohai's database

Requirements:

  • Compute environment

    • Cloud account (AWS, Azure, GCP, Scaleway, Oracle)

    • on-premises infrastructure

    • (optional, for K8s workers) Kubernetes cluster

  • Network access from app.valohai.com to your queue machine

  • Object storage (S3, Azure Blob, GCS, or S3-compatible)

Self-Hosted Deployment

Both application and infrastructure run entirely within your environment. You control updates, access, and all network policies.

How it works:

  • You host the Valohai application in your infrastructure

  • Users access your instance at your domain

  • All components communicate within your network

  • Valohai provides application updates as Docker images

Requirements:

  • Infrastructure for the application (bare metal, VM, or Kubernetes)

  • PostgreSQL database

  • Redis instance

  • Object storage

Components

Component
Purpose
Specifications

Queue Machine

Manages job scheduling and routing

• Static VM or managed Redis service • 2 vCPUs, 4GB RAM minimum • DNS name • Runs Redis for job queue and short-term logs • TLS certificate for secure communication

Workers

Execute ML workloads

• Pull jobs from queue machine • Download inputs from object storage • Run code in Docker containers • Upload outputs to storage • Scale down when idle

Object Storage

Stores artifacts, models, and logs

• Git repository snapshots • Execution logs • Input datasets • Output artifacts (models, visualizations)

Worker Types

Valohai supports flexible worker infrastructure to match your environment:

Type
Description
Use Case

Autoscaled VMs

Cloud instances that scale based on demand

AWS, Azure, GCP, Scaleway, Oracle Cloud (OCI)

Kubernetes Pods

Containerized workers in K8s clusters

Cloud or on-premises Kubernetes

Static Machines

Physical or virtual machines

On-premises data centers, dedicated hardware

SLURM Nodes

HPC cluster integration

Existing SLURM-managed compute clusters

Supported Object Storage

  • AWS S3

  • Azure Blob Storage

  • Google Cloud Storage

  • Oracle Cloud Infrastructure (OCI) Object Storage

  • S3-compatible storage (MinIO, NetApp, etc.)

Networking & Security

Hybrid Deployment

Source
Destination
Port
Purpose

app.valohai.com (34.248.245.191, 63.34.156.112)

Queue machine

63790

Job scheduling and queue management

Workers

Queue machine

63790

Pull jobs from Redis queue

Workers

app.valohai.com

443

Report status and metadata

Workers

Object storage

443

Download inputs, upload outputs

Workers

Internet

443

Pull Docker images and packages

Queue machine

Internet

80

Let's Encrypt certificate renewal (or use your own certificate)

Security principles:

  • Code and data never leave your environment

  • Workers never accept inbound connections from Valohai

  • Only job metadata and logs are sent to Valohai's database

  • All communication uses TLS encryption

Self-Hosted Deployment

Source
Destination
Port
Purpose

Users

Valohai app

443

Web interface access

Valohai app

PostgreSQL

5432

Database queries (internal only)

Valohai app

Queue machine

63790

Job scheduling

Workers

Queue machine

63790

Pull jobs from Redis queue

Workers

Valohai app

443

Report status and metadata

Workers

Object storage

443

Download inputs, upload outputs

💡 In self-hosted deployments, all network policies are under your control. The ports listed are defaults and can be customized during setup.

Choose Your Installation Path

I have AWS

I have Azure

I have GCP

I have Kubernetes

I have on-premises servers

I have OpenShift

I have SLURM cluster

What You'll Need

Every installation requires:

From Valohai:

  • Queue address (e.g., yourcompany.vqueue.net)

    • You can also provision your own certificate

  • Redis password (stored in your secret manager)

  • For self-hosted: Docker images and license details

From your environment:

  • Cloud account with appropriate permissions

  • Object storage bucket

  • Network connectivity between components

  • For GPU workloads: GPU quota and drivers

Before you start:

  • Review your organization's network and security policies

  • Ensure you have necessary cloud permissions

  • Decide on resource allocation (VM types, storage size)

  • Plan your environment separation (dev/prod)

Getting Help

Contact [email protected] if you:

  • Need help choosing the right deployment model

  • Have specific security or compliance requirements

  • Want guidance on resource sizing

  • Encounter issues during installation

Last updated

Was this helpful?