Self-Hosted Deployment
This guide contains YAML templates and instructions for setting up a self-hosted Valohai installation on an OpenShift cluster.
Depending on your organization's infrastructure, you may need to adjust these steps to fit your environment.
Need help with custom configurations? Contact your Valohai representative for assistance with specific login options, email server connections, or other custom requirements.
Prerequisites
Existing infrastructure:
An OpenShift cluster with administrative or sufficient privileges
At least one node with 4 CPUs and 16 GB RAM for Valohai core services
Tools:
Access to the OpenShift cluster from your CLI
From Valohai:
Contact [email protected] to receive:
Docker images for the Valohai application
Kubernetes YAML templates
Configuration values
Architecture
Valohai's self-hosted setup comprises four core components:
Application Components:
Valohai application (roi) - Main web app
PostgreSQL - Database for metadata and records (can use RDS instead)
Redis - Job queue and caching layer (can use ElastiCache instead)
Optimo - Bayesian optimization service
Namespace:
These components typically run inside the same namespace (e.g., valohai or default).
Network Communication:
Ensure appropriate NetworkPolicies (if enabled) allow communication:
Valohai ↔ Redis on port 6379
Valohai ↔ Postgres on port 5432
Valohai ↔ Optimo on port 80
Clone the Repository
Get the Valohai self-hosted Kubernetes manifests:
Configure Settings
You need to configure three files before deployment.
Database Configuration
Edit db-config-configmap.yaml:
Generate a strong password with uppercase, lowercase letters, and numbers (no special characters).
Optimo Configuration
Edit optimo-deployment.yaml:
Generate a strong password with uppercase, lowercase letters, and numbers (no special characters).
Application Configuration
Edit roi-config-configmap.yaml:
Required values:
Generate secure keys:
Run this command three times to generate unique values for SECRET_KEY, REPO_PRIVATE_KEY_SECRET, and STATS_JWT_KEY.
Optional configurations:
Add these to roi-config-configmap.yaml if needed:
Discuss additional settings with Valohai support.
Prepare the Valohai Docker Image
Valohai will provide a Docker image for the application.
Push to OpenShift Registry
If using OpenShift's internal registry:
Update Deployment
Edit valohai-deployment.yaml to reference your image:
Ensure the pull secret (if needed) is properly configured on your OpenShift cluster.
Note: In addition to the Valohai application, you will have separate pods for database (
postgres), job queue (redis), and Bayesian optimization (optimo). These images are publicly available, so no changes are needed to those YAML files.
Create Project/Namespace
Create a namespace for Valohai:
Deploy Valohai
Apply all YAML files:
Verify Deployment
Check that resources are up:
You should see pods for valohai, postgres, redis, and optimo running.
Wait for all pods to be in Running state:
Press Ctrl+C when all pods are running.
Create Admin User
After the Valohai pods are running, create an admin user to log into the web interface.
1. Shell into the Valohai pod:
2. Run the initialization command:
This creates an admin account with credentials printed to stdout. Save these credentials securely.
3. Exit the pod:
Or press Ctrl+D.
Expose the Valohai Web App
In OpenShift, use Routes to expose services externally.
Create Route
Get the Route
OpenShift will generate a hostname. You can access your Valohai web UI at that address.
Configure HTTPS/TLS
By default, oc expose creates an HTTP route. For HTTPS/TLS, configure TLS certificates.
Refer to OpenShift's documentation on creating secure routes.
Set Up Workers
Valohai needs workers to run your machine learning workloads. You have several options:
OpenShift/Kubernetes Workers
For easier installation of OpenShift workers, we recommend using Helm.
Install with Helm:
A Helm chart is available to install Valohai workers to OpenShift clusters.
Contact your Valohai representative to receive the required custom-values.yaml file.
Once installation is complete, supply the installer output to the Valohai team along with connection information to your Kubernetes API (hostname, port).
Note: The installer output might be incomplete with placeholders if Helm reports back before resources are fully initialized. Wait a moment and rerun the command to get complete output.
Alternative Worker Options
You can also use:
On-premises servers: Ubuntu installer or manual install
Autoscaled EC2 instances: AWS hybrid deployment
Important: Workers need to connect to the Redis queue on port 6379 set up in your cluster during this installation.
Set Up Data Store
Valohai requires an S3-compatible data store. Options include:
MinIO on the cluster:
S3 bucket:
S3 compatible bucket in your account
Discuss with your Valohai contact which option best fits your needs.
Database and Redis Options
In-Cluster (Development)
The YAML templates include PostgreSQL and Redis deployments.
Use for: Development and testing environments
Considerations:
Requires persistent volume management
Manual backup procedures
Less robust for production
Managed Services (Production)
For production, consider using managed services:
Amazon RDS for PostgreSQL:
Automated backups
Multi-AZ high availability
Managed updates
Amazon ElastiCache for Redis:
Automated failover
Managed scaling
Better performance
If using managed services:
Remove the in-cluster
postgres-deployment.yamlandredis-deployment.yamlbefore deployingUpdate
DATABASE_URLandREDIS_URLinroi-config-configmap.yamlto point to your managed services
Monitoring
View Pod Logs
Check Pod Status
Check Resource Usage
Troubleshooting
Pods Not Starting
Check pod status:
Common issues:
Image pull errors (check registry credentials)
Insufficient resources (check node capacity)
Failed health checks (check application logs)
Database Connection Errors
Verify service:
Test connection from pod:
Cannot Access Web UI
Check route:
Verify service:
Check pod health:
Getting Help
Valohai Support: [email protected]
Include in support requests:
OpenShift version
Pod logs:
oc logs <pod-name> -n valohaiPod descriptions:
oc describe pod <pod-name> -n valohaiRecent events:
oc get events -n valohai --sort-by='.lastTimestamp'Description of the issue and when it started
Last updated
Was this helpful?
