Self-Hosted Deployment
Deploy a fully self-hosted Valohai installation on your OpenShift cluster
This guide contains YAML templates and instructions for setting up a self-hosted Valohai installation on an OpenShift cluster.
Depending on your organization's infrastructure, you may need to adjust these steps to fit your environment.
Need help with custom configurations? Contact your Valohai representative for assistance with specific login options, email server connections, or other custom requirements.
Prerequisites
Existing infrastructure:
An OpenShift cluster with administrative or sufficient privileges
At least one node with 4 CPUs and 16 GB RAM for Valohai core services
Tools:
Access to the OpenShift cluster from your CLI
From Valohai:
Contact [email protected] to receive:
Docker images for the Valohai application
Kubernetes YAML templates
Configuration values
Architecture
Valohai's self-hosted setup comprises four core components:
Application Components:
Valohai application (roi) - Main web app
PostgreSQL - Database for metadata and records (can use RDS instead)
Redis - Job queue and caching layer (can use ElastiCache instead)
Optimo - Bayesian optimization service
Namespace:
These components typically run inside the same namespace (e.g., valohai or default).
Network Communication:
Ensure appropriate NetworkPolicies (if enabled) allow communication:
Valohai ↔ Redis on port 6379
Valohai ↔ Postgres on port 5432
Valohai ↔ Optimo on port 80
Clone the Repository
Get the Valohai self-hosted Kubernetes manifests:
git clone https://github.com/valohai/valohai-self-hosted-k8.git
cd valohai-self-hosted-k8Configure Settings
You need to configure three files before deployment.
Database Configuration
Edit db-config-configmap.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: db-config
namespace: valohai
data:
POSTGRES_PASSWORD: "<uppercase-lowercase-letters-numbers>"Generate a strong password with uppercase, lowercase letters, and numbers (no special characters).
Optimo Configuration
Edit optimo-deployment.yaml:
env:
- name: OPTIMO_BASIC_AUTH_PASSWORD
value: "<uppercase-lowercase-letters-numbers>"Generate a strong password with uppercase, lowercase letters, and numbers (no special characters).
Application Configuration
Edit roi-config-configmap.yaml:
Required values:
apiVersion: v1
kind: ConfigMap
metadata:
name: roi-config
namespace: valohai
data:
# Database connection - PASSWORD must match POSTGRES_PASSWORD from db-config-configmap.yaml
DATABASE_URL: "postgresql://postgres:<password>@postgres:5432/valohai"
# Application URL - The external URL users will access
URL_BASE: "https://valohai.yourdomain.com"
# Security keys - Generate random strings with uppercase, lowercase, and numbers
SECRET_KEY: "<generate-random-string>"
REPO_PRIVATE_KEY_SECRET: "<generate-random-string>"
STATS_JWT_KEY: "<generate-random-string>"
# Optimo connection - Must match OPTIMO_BASIC_AUTH_PASSWORD from optimo-deployment.yaml
OPTIMO_BASIC_AUTH_PASSWORD: "<same-as-optimo>"
# Redis connection
REDIS_URL: "redis://redis:6379/0"Generate secure keys:
python3 -c "import secrets; print(secrets.token_urlsafe(50))"Run this command three times to generate unique values for SECRET_KEY, REPO_PRIVATE_KEY_SECRET, and STATS_JWT_KEY.
Optional configurations:
Add these to roi-config-configmap.yaml if needed:
# SMTP for email notifications
EMAIL_HOST: "smtp.yourcompany.com"
EMAIL_PORT: "587"
EMAIL_HOST_USER: "[email protected]"
EMAIL_HOST_PASSWORD: "<smtp-password>"
# SSO configuration
SOCIAL_AUTH_SAML_ENABLED_IDPS: '{"your_idp": {...}}'Discuss additional settings with Valohai support.
Prepare the Valohai Docker Image
Valohai will provide a Docker image for the application.
Push to OpenShift Registry
If using OpenShift's internal registry:
# Login to OpenShift
oc login --token=<your-openshift-token> --server=<openshift-api-url>
# Login to registry
docker login -u <user> -p <token> <registry-url>
# Pull Valohai image
docker pull <valohai-image-from-source>
# Tag for OpenShift registry
docker tag <valohai-image-from-source> <your-openshift-registry>/<namespace>/valohai:latest
# Push to registry
docker push <your-openshift-registry>/<namespace>/valohai:latestUpdate Deployment
Edit valohai-deployment.yaml to reference your image:
spec:
template:
spec:
containers:
- name: valohai
image: <your-openshift-registry>/<namespace>/valohai:latestEnsure the pull secret (if needed) is properly configured on your OpenShift cluster.
Note: In addition to the Valohai application, you will have separate pods for database (
postgres), job queue (redis), and Bayesian optimization (optimo). These images are publicly available, so no changes are needed to those YAML files.
Create Project/Namespace
Create a namespace for Valohai:
oc new-project valohai
# or
oc create namespace valohaiDeploy Valohai
Apply all YAML files:
kubectl apply -f . -n valohai
# or
oc apply -f . -n valohaiVerify Deployment
Check that resources are up:
oc get pods -n valohai
oc get deployments -n valohai
oc get services -n valohaiYou should see pods for valohai, postgres, redis, and optimo running.
Wait for all pods to be in Running state:
oc get pods -n valohai -wPress Ctrl+C when all pods are running.
Create Admin User
After the Valohai pods are running, create an admin user to log into the web interface.
1. Shell into the Valohai pod:
POD_NAME=$(oc get pod -n valohai -l app=valohai -o jsonpath='{.items[0].metadata.name}')
oc rsh $POD_NAME -n valohai2. Run the initialization command:
python manage.py roi_init --mode devThis creates an admin account with credentials printed to stdout. Save these credentials securely.
3. Exit the pod:
exitOr press Ctrl+D.
Expose the Valohai Web App
In OpenShift, use Routes to expose services externally.
Create Route
oc expose svc/valohai -n valohaiGet the Route
oc get routes -n valohaiOpenShift will generate a hostname. You can access your Valohai web UI at that address.
Configure HTTPS/TLS
By default, oc expose creates an HTTP route. For HTTPS/TLS, configure TLS certificates.
Refer to OpenShift's documentation on creating secure routes.
Set Up Workers
Valohai needs workers to run your machine learning workloads. You have several options:
OpenShift/Kubernetes Workers
For easier installation of OpenShift workers, we recommend using Helm.
Install with Helm:
A Helm chart is available to install Valohai workers to OpenShift clusters.
Contact your Valohai representative to receive the required custom-values.yaml file.
helm repo add valohai --force-update https://dist.valohai.com/charts/
helm upgrade --install \
-n valohai-workers \
--create-namespace \
valohai-workers \
valohai/valohai-workers \
-f custom-values.yamlOnce installation is complete, supply the installer output to the Valohai team along with connection information to your Kubernetes API (hostname, port).
Note: The installer output might be incomplete with placeholders if Helm reports back before resources are fully initialized. Wait a moment and rerun the command to get complete output.
Alternative Worker Options
You can also use:
On-premises servers: Ubuntu installer or manual install
Autoscaled EC2 instances: AWS hybrid deployment
Important: Workers need to connect to the Redis queue on port 6379 set up in your cluster during this installation.
Set Up Data Store
Valohai requires an S3-compatible data store. Options include:
MinIO on the cluster:
S3 bucket:
S3 compatible bucket in your account
Discuss with your Valohai contact which option best fits your needs.
Database and Redis Options
In-Cluster (Development)
The YAML templates include PostgreSQL and Redis deployments.
Use for: Development and testing environments
Considerations:
Requires persistent volume management
Manual backup procedures
Less robust for production
Managed Services (Production)
For production, consider using managed services:
Amazon RDS for PostgreSQL:
Automated backups
Multi-AZ high availability
Managed updates
Amazon ElastiCache for Redis:
Automated failover
Managed scaling
Better performance
If using managed services:
Remove the in-cluster
postgres-deployment.yamlandredis-deployment.yamlbefore deployingUpdate
DATABASE_URLandREDIS_URLinroi-config-configmap.yamlto point to your managed services
Monitoring
View Pod Logs
oc logs -f deployment/valohai -n valohaiCheck Pod Status
oc get pods -n valohai
oc describe pod <pod-name> -n valohaiCheck Resource Usage
oc adm top pods -n valohai
oc adm top nodesTroubleshooting
Pods Not Starting
Check pod status:
oc get pods -n valohai
oc describe pod <pod-name> -n valohaiCommon issues:
Image pull errors (check registry credentials)
Insufficient resources (check node capacity)
Failed health checks (check application logs)
Database Connection Errors
Verify service:
oc get svc postgres -n valohaiTest connection from pod:
oc run -it --rm debug --image=postgres:14 --restart=Never -n valohai -- psql -h postgres -U postgresCannot Access Web UI
Check route:
oc describe route valohai -n valohaiVerify service:
oc get svc valohai -n valohaiCheck pod health:
oc get pods -n valohai -l app=valohaiGetting Help
Valohai Support: [email protected]
Include in support requests:
OpenShift version
Pod logs:
oc logs <pod-name> -n valohaiPod descriptions:
oc describe pod <pod-name> -n valohaiRecent events:
oc get events -n valohai --sort-by='.lastTimestamp'Description of the issue and when it started
Last updated
Was this helpful?
