Self-Hosted Deployment

Deploy a fully self-hosted Valohai installation on your OpenShift cluster

This guide contains YAML templates and instructions for setting up a self-hosted Valohai installation on an OpenShift cluster.

Depending on your organization's infrastructure, you may need to adjust these steps to fit your environment.

Need help with custom configurations? Contact your Valohai representative for assistance with specific login options, email server connections, or other custom requirements.

Prerequisites

Existing infrastructure:

An OpenShift cluster with administrative or sufficient privileges
At least one node with 4 CPUs and 16 GB RAM for Valohai core services

Tools:

kubectl or oc client installed and configured
Access to the OpenShift cluster from your CLI

From Valohai:

Contact [email protected] to receive:

Docker images for the Valohai application
Kubernetes YAML templates
Configuration values

Architecture

Valohai's self-hosted setup comprises four core components:

Application Components:

Valohai application (roi) - Main web app
PostgreSQL - Database for metadata and records (can use RDS instead)
Redis - Job queue and caching layer (can use ElastiCache instead)
Optimo - Bayesian optimization service

Namespace:

These components typically run inside the same namespace (e.g., valohai or default).

Network Communication:

Ensure appropriate NetworkPolicies (if enabled) allow communication:

Valohai ↔ Redis on port 6379
Valohai ↔ Postgres on port 5432
Valohai ↔ Optimo on port 80

Clone the Repository

Get the Valohai self-hosted Kubernetes manifests:

git clone https://github.com/valohai/valohai-self-hosted-k8.git
cd valohai-self-hosted-k8

Configure Settings

You need to configure three files before deployment.

Database Configuration

Edit db-config-configmap.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: db-config
  namespace: valohai
data:
  POSTGRES_PASSWORD: "<uppercase-lowercase-letters-numbers>"

Generate a strong password with uppercase, lowercase letters, and numbers (no special characters).

Optimo Configuration

Edit optimo-deployment.yaml:

env:
  - name: OPTIMO_BASIC_AUTH_PASSWORD
    value: "<uppercase-lowercase-letters-numbers>"

Generate a strong password with uppercase, lowercase letters, and numbers (no special characters).

Application Configuration

Edit roi-config-configmap.yaml:

Required values:

apiVersion: v1
kind: ConfigMap
metadata:
  name: roi-config
  namespace: valohai
data:
  # Database connection - PASSWORD must match POSTGRES_PASSWORD from db-config-configmap.yaml
  DATABASE_URL: "postgresql://postgres:<password>@postgres:5432/valohai"
  
  # Application URL - The external URL users will access
  URL_BASE: "https://valohai.yourdomain.com"
  
  # Security keys - Generate random strings with uppercase, lowercase, and numbers
  SECRET_KEY: "<generate-random-string>"
  REPO_PRIVATE_KEY_SECRET: "<generate-random-string>"
  STATS_JWT_KEY: "<generate-random-string>"
  
  # Optimo connection - Must match OPTIMO_BASIC_AUTH_PASSWORD from optimo-deployment.yaml
  OPTIMO_BASIC_AUTH_PASSWORD: "<same-as-optimo>"
  
  # Redis connection
  REDIS_URL: "redis://redis:6379/0"

Generate secure keys:

python3 -c "import secrets; print(secrets.token_urlsafe(50))"

Run this command three times to generate unique values for SECRET_KEY, REPO_PRIVATE_KEY_SECRET, and STATS_JWT_KEY.

Optional configurations:

Add these to roi-config-configmap.yaml if needed:

  # SMTP for email notifications
  EMAIL_HOST: "smtp.yourcompany.com"
  EMAIL_PORT: "587"
  EMAIL_HOST_USER: "[email protected]"
  EMAIL_HOST_PASSWORD: "<smtp-password>"
  
  # SSO configuration
  SOCIAL_AUTH_SAML_ENABLED_IDPS: '{"your_idp": {...}}'

Discuss additional settings with Valohai support.

Prepare the Valohai Docker Image

Valohai will provide a Docker image for the application.

Push to OpenShift Registry

If using OpenShift's internal registry:

# Login to OpenShift
oc login --token=<your-openshift-token> --server=<openshift-api-url>

# Login to registry
docker login -u <user> -p <token> <registry-url>

# Pull Valohai image
docker pull <valohai-image-from-source>

# Tag for OpenShift registry
docker tag <valohai-image-from-source> <your-openshift-registry>/<namespace>/valohai:latest

# Push to registry
docker push <your-openshift-registry>/<namespace>/valohai:latest

Update Deployment

Edit valohai-deployment.yaml to reference your image:

spec:
  template:
    spec:
      containers:
      - name: valohai
        image: <your-openshift-registry>/<namespace>/valohai:latest

Ensure the pull secret (if needed) is properly configured on your OpenShift cluster.

Note: In addition to the Valohai application, you will have separate pods for database (postgres), job queue (redis), and Bayesian optimization (optimo). These images are publicly available, so no changes are needed to those YAML files.

Create Project/Namespace

Create a namespace for Valohai:

oc new-project valohai
# or
oc create namespace valohai

Deploy Valohai

Apply all YAML files:

kubectl apply -f . -n valohai
# or
oc apply -f . -n valohai

Verify Deployment

Check that resources are up:

oc get pods -n valohai
oc get deployments -n valohai
oc get services -n valohai

You should see pods for valohai, postgres, redis, and optimo running.

Wait for all pods to be in Running state:

oc get pods -n valohai -w

Press Ctrl+C when all pods are running.

Create Admin User

After the Valohai pods are running, create an admin user to log into the web interface.

1. Shell into the Valohai pod:

POD_NAME=$(oc get pod -n valohai -l app=valohai -o jsonpath='{.items[0].metadata.name}')
oc rsh $POD_NAME -n valohai

2. Run the initialization command:

python manage.py roi_init --mode dev

This creates an admin account with credentials printed to stdout. Save these credentials securely.

3. Exit the pod:

exit

Or press Ctrl+D.

Expose the Valohai Web App

In OpenShift, use Routes to expose services externally.

Create Route

oc expose svc/valohai -n valohai

Get the Route

oc get routes -n valohai

OpenShift will generate a hostname. You can access your Valohai web UI at that address.

Configure HTTPS/TLS

By default, oc expose creates an HTTP route. For HTTPS/TLS, configure TLS certificates.

Refer to OpenShift's documentation on creating secure routes.

Set Up Workers

Valohai needs workers to run your machine learning workloads. You have several options:

OpenShift/Kubernetes Workers

For easier installation of OpenShift workers, we recommend using Helm.

Install with Helm:

A Helm chart is available to install Valohai workers to OpenShift clusters.

Contact your Valohai representative to receive the required custom-values.yaml file.

helm repo add valohai --force-update https://dist.valohai.com/charts/
helm upgrade --install \
    -n valohai-workers \
    --create-namespace \
    valohai-workers \
    valohai/valohai-workers \
    -f custom-values.yaml

Once installation is complete, supply the installer output to the Valohai team along with connection information to your Kubernetes API (hostname, port).

Note: The installer output might be incomplete with placeholders if Helm reports back before resources are fully initialized. Wait a moment and rerun the command to get complete output.

Alternative Worker Options

You can also use:

On-premises servers: Ubuntu installer or manual install
Autoscaled EC2 instances: AWS hybrid deployment

Important: Workers need to connect to the Redis queue on port 6379 set up in your cluster during this installation.

Set Up Data Store

Valohai requires an S3-compatible data store. Options include:

MinIO on the cluster:

MinIO for Kubernetes

S3 bucket:

S3 compatible bucket in your account

Discuss with your Valohai contact which option best fits your needs.

Database and Redis Options

In-Cluster (Development)

The YAML templates include PostgreSQL and Redis deployments.

Use for: Development and testing environments

Considerations:

Requires persistent volume management
Manual backup procedures
Less robust for production

Managed Services (Production)

For production, consider using managed services:

Amazon RDS for PostgreSQL:

Automated backups
Multi-AZ high availability
Managed updates

Amazon ElastiCache for Redis:

Automated failover
Managed scaling
Better performance

If using managed services:

Remove the in-cluster postgres-deployment.yaml and redis-deployment.yaml before deploying
Update DATABASE_URL and REDIS_URL in roi-config-configmap.yaml to point to your managed services

Monitoring

View Pod Logs

oc logs -f deployment/valohai -n valohai

Check Pod Status

oc get pods -n valohai
oc describe pod <pod-name> -n valohai

Check Resource Usage

oc adm top pods -n valohai
oc adm top nodes

Troubleshooting

Pods Not Starting

Check pod status:

oc get pods -n valohai
oc describe pod <pod-name> -n valohai

Common issues:

Image pull errors (check registry credentials)
Insufficient resources (check node capacity)
Failed health checks (check application logs)

Database Connection Errors

Verify service:

oc get svc postgres -n valohai

Test connection from pod:

oc run -it --rm debug --image=postgres:14 --restart=Never -n valohai -- psql -h postgres -U postgres

Cannot Access Web UI

Check route:

oc describe route valohai -n valohai

Verify service:

oc get svc valohai -n valohai

Check pod health:

oc get pods -n valohai -l app=valohai

Getting Help

Valohai Support: [email protected]

Include in support requests:

OpenShift version
Pod logs: oc logs <pod-name> -n valohai
Pod descriptions: oc describe pod <pod-name> -n valohai
Recent events: oc get events -n valohai --sort-by='.lastTimestamp'
Description of the issue and when it started

PreviousOpenShift NextSLURM

Last updated 14 days ago

Was this helpful?