# Self-Hosted Deployment

This guide contains YAML templates and instructions for setting up a self-hosted Valohai installation on an OpenShift cluster.

Depending on your organization's infrastructure, you may need to adjust these steps to fit your environment.

> **Need help with custom configurations?** Contact your Valohai representative for assistance with specific login options, email server connections, or other custom requirements.

## Prerequisites

**Existing infrastructure:**

* An OpenShift cluster with administrative or sufficient privileges
* At least one node with 4 CPUs and 16 GB RAM for Valohai core services

**Tools:**

* [kubectl](https://kubernetes.io/docs/tasks/tools/) or [oc client](https://docs.openshift.com/container-platform/latest/cli_reference/openshift_cli/getting-started-cli.html) installed and configured
* Access to the OpenShift cluster from your CLI

**From Valohai:**

Contact **<support@valohai.com>** to receive:

* Docker images for the Valohai application
* Kubernetes YAML templates
* Configuration values

## Architecture

Valohai's self-hosted setup comprises four core components:

**Application Components:**

* **Valohai application (roi)** - Main web app
* **PostgreSQL** - Database for metadata and records (can use RDS instead)
* **Redis** - Job queue and caching layer (can use ElastiCache instead)
* **Optimo** - Bayesian optimization service

**Namespace:**

These components typically run inside the same namespace (e.g., `valohai` or `default`).

**Network Communication:**

Ensure appropriate NetworkPolicies (if enabled) allow communication:

* Valohai ↔ Redis on port 6379
* Valohai ↔ Postgres on port 5432
* Valohai ↔ Optimo on port 80

## Clone the Repository

Get the Valohai self-hosted Kubernetes manifests:

```shell
git clone https://github.com/valohai/valohai-self-hosted-k8.git
cd valohai-self-hosted-k8
```

## Configure Settings

You need to configure three files before deployment.

### Database Configuration

Edit `db-config-configmap.yaml`:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: db-config
  namespace: valohai
data:
  POSTGRES_PASSWORD: "<uppercase-lowercase-letters-numbers>"
```

**Generate a strong password** with uppercase, lowercase letters, and numbers (no special characters).

### Optimo Configuration

Edit `optimo-deployment.yaml`:

```yaml
env:
  - name: OPTIMO_BASIC_AUTH_PASSWORD
    value: "<uppercase-lowercase-letters-numbers>"
```

**Generate a strong password** with uppercase, lowercase letters, and numbers (no special characters).

### Application Configuration

Edit `roi-config-configmap.yaml`:

**Required values:**

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: roi-config
  namespace: valohai
data:
  # Database connection - PASSWORD must match POSTGRES_PASSWORD from db-config-configmap.yaml
  DATABASE_URL: "postgresql://postgres:<password>@postgres:5432/valohai"

  # Application URL - The external URL users will access
  URL_BASE: "https://valohai.yourdomain.com"

  # Security keys - Generate random strings with uppercase, lowercase, and numbers
  SECRET_KEY: "<generate-random-string>"
  REPO_PRIVATE_KEY_SECRET: "<generate-random-string>"
  STATS_JWT_KEY: "<generate-random-string>"

  # Optimo connection - Must match OPTIMO_BASIC_AUTH_PASSWORD from optimo-deployment.yaml
  OPTIMO_BASIC_AUTH_PASSWORD: "<same-as-optimo>"

  # Redis connection
  REDIS_URL: "redis://redis:6379/0"
```

**Generate secure keys:**

```shell
python3 -c "import secrets; print(secrets.token_urlsafe(50))"
```

Run this command three times to generate unique values for `SECRET_KEY`, `REPO_PRIVATE_KEY_SECRET`, and `STATS_JWT_KEY`.

**Optional configurations:**

Add these to `roi-config-configmap.yaml` if needed:

```yaml
  # SMTP for email notifications
  EMAIL_HOST: "smtp.yourcompany.com"
  EMAIL_PORT: "587"
  EMAIL_HOST_USER: "valohai@yourcompany.com"
  EMAIL_HOST_PASSWORD: "<smtp-password>"

  # SSO configuration
  SOCIAL_AUTH_SAML_ENABLED_IDPS: '{"your_idp": {...}}'
```

Discuss additional settings with Valohai support.

## Prepare the Valohai Docker Image

Valohai will provide a Docker image for the application.

### Push to OpenShift Registry

If using OpenShift's internal registry:

```shell
# Login to OpenShift
oc login --token=<your-openshift-token> --server=<openshift-api-url>

# Login to registry
docker login -u <user> -p <token> <registry-url>

# Pull Valohai image
docker pull <valohai-image-from-source>

# Tag for OpenShift registry
docker tag <valohai-image-from-source> <your-openshift-registry>/<namespace>/valohai:latest

# Push to registry
docker push <your-openshift-registry>/<namespace>/valohai:latest
```

### Update Deployment

Edit `valohai-deployment.yaml` to reference your image:

```yaml
spec:
  template:
    spec:
      containers:
      - name: valohai
        image: <your-openshift-registry>/<namespace>/valohai:latest
```

Ensure the pull secret (if needed) is properly configured on your OpenShift cluster.

> **Note:** In addition to the Valohai application, you will have separate pods for database (`postgres`), job queue (`redis`), and Bayesian optimization (`optimo`). These images are publicly available, so no changes are needed to those YAML files.

## Create Project/Namespace

Create a namespace for Valohai:

```shell
oc new-project valohai
# or
oc create namespace valohai
```

## Deploy Valohai

Apply all YAML files:

```shell
kubectl apply -f . -n valohai
# or
oc apply -f . -n valohai
```

### Verify Deployment

Check that resources are up:

```shell
oc get pods -n valohai
oc get deployments -n valohai
oc get services -n valohai
```

You should see pods for `valohai`, `postgres`, `redis`, and `optimo` running.

Wait for all pods to be in `Running` state:

```shell
oc get pods -n valohai -w
```

Press Ctrl+C when all pods are running.

## Create Admin User

After the Valohai pods are running, create an admin user to log into the web interface.

**1. Shell into the Valohai pod:**

```bash
POD_NAME=$(oc get pod -n valohai -l app=valohai -o jsonpath='{.items[0].metadata.name}')
oc rsh $POD_NAME -n valohai
```

**2. Run the initialization command:**

```shell
python manage.py roi_init --mode dev
```

This creates an admin account with credentials printed to stdout. **Save these credentials securely.**

**3. Exit the pod:**

```shell
exit
```

Or press Ctrl+D.

## Expose the Valohai Web App

In OpenShift, use Routes to expose services externally.

### Create Route

```shell
oc expose svc/valohai -n valohai
```

### Get the Route

```shell
oc get routes -n valohai
```

OpenShift will generate a hostname. You can access your Valohai web UI at that address.

### Configure HTTPS/TLS

By default, `oc expose` creates an HTTP route. For HTTPS/TLS, configure TLS certificates.

Refer to [OpenShift's documentation on creating secure routes](https://docs.openshift.com/container-platform/4.17/networking/routes/secured-routes.html).

## Set Up Workers

Valohai needs workers to run your machine learning workloads. You have several options:

### OpenShift/Kubernetes Workers

For easier installation of OpenShift workers, we recommend using Helm.

**Install with Helm:**

A Helm chart is available to install Valohai workers to OpenShift clusters.

Contact your Valohai representative to receive the required `custom-values.yaml` file.

```shell
helm repo add valohai --force-update https://dist.valohai.com/charts/
helm upgrade --install \
    -n valohai-workers \
    --create-namespace \
    valohai-workers \
    valohai/valohai-workers \
    -f custom-values.yaml
```

Once installation is complete, supply the installer output to the Valohai team along with connection information to your Kubernetes API (hostname, port).

> **Note:** The installer output might be incomplete with placeholders if Helm reports back before resources are fully initialized. Wait a moment and rerun the command to get complete output.

### Alternative Worker Options

You can also use:

* **On-premises servers:** [Ubuntu installer](https://github.com/valohai/dokuhai/blob/main/on-premises/linux-workers.md) or manual install
* **Autoscaled EC2 instances:** [AWS hybrid deployment](https://github.com/valohai/dokuhai/blob/main/aws/hybrid.md)

> **Important:** Workers need to connect to the Redis queue on port 6379 set up in your cluster during this installation.

## Set Up Data Store

Valohai requires an S3-compatible data store. Options include:

**MinIO on the cluster:**

* [MinIO for Kubernetes](https://min.io/docs/minio/kubernetes/upstream/index.html)

**S3 bucket:**

* S3 compatible bucket in your account

Discuss with your Valohai contact which option best fits your needs.

## Database and Redis Options

### In-Cluster (Development)

The YAML templates include PostgreSQL and Redis deployments.

**Use for:** Development and testing environments

**Considerations:**

* Requires persistent volume management
* Manual backup procedures
* Less robust for production

### Managed Services (Production)

For production, consider using managed services:

**Amazon RDS** for PostgreSQL:

* Automated backups
* Multi-AZ high availability
* Managed updates

**Amazon ElastiCache** for Redis:

* Automated failover
* Managed scaling
* Better performance

If using managed services:

1. Remove the in-cluster `postgres-deployment.yaml` and `redis-deployment.yaml` before deploying
2. Update `DATABASE_URL` and `REDIS_URL` in `roi-config-configmap.yaml` to point to your managed services

## Monitoring

### View Pod Logs

```shell
oc logs -f deployment/valohai -n valohai
```

### Check Pod Status

```shell
oc get pods -n valohai
oc describe pod <pod-name> -n valohai
```

### Check Resource Usage

```shell
oc adm top pods -n valohai
oc adm top nodes
```

## Troubleshooting

### Pods Not Starting

**Check pod status:**

```shell
oc get pods -n valohai
oc describe pod <pod-name> -n valohai
```

**Common issues:**

* Image pull errors (check registry credentials)
* Insufficient resources (check node capacity)
* Failed health checks (check application logs)

### Database Connection Errors

**Verify service:**

```shell
oc get svc postgres -n valohai
```

**Test connection from pod:**

```shell
oc run -it --rm debug --image=postgres:14 --restart=Never -n valohai -- psql -h postgres -U postgres
```

### Cannot Access Web UI

**Check route:**

```shell
oc describe route valohai -n valohai
```

**Verify service:**

```shell
oc get svc valohai -n valohai
```

**Check pod health:**

```shell
oc get pods -n valohai -l app=valohai
```

## Getting Help

**Valohai Support:** <support@valohai.com>

**Include in support requests:**

* OpenShift version
* Pod logs: `oc logs <pod-name> -n valohai`
* Pod descriptions: `oc describe pod <pod-name> -n valohai`
* Recent events: `oc get events -n valohai --sort-by='.lastTimestamp'`
* Description of the issue and when it started


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/installation-and-setup/openshift/self-hosted.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
