This guide will go through how to set up a self-hosted installation of Valohai in your AWS EKS cluster.
A Valohai self-hosted model allows you to run all components of Valohai inside your own network. This means that your users won’t use app.valohai.com to manage their ML projects but a version of Valohai that’s hosted by you.
Updates to the platform are delivered through Docker images.
Prerequisites
- Existing Kubernetes cluster, such as AWS EKS
- Install AWS CLI
- Configure an AWS profile that can access the EKS cluster from your CLI
- There should be at least two subnets in the VPC where the cluster resides for the loadbalancer.
- Install kubectl
- Install helm
-
Karpenter installed on the cluster
- It is possible to use some other option for the autoscaling but these instructions have been tested by using Karpenter.
Set up the Valohai application and related components
Clone the Git repository
Start by cloning the public Github repository to get the YAML files required for the installation.
Set up the configmaps
Provide the following values in the db-config-configmap.yaml
:
-
POSTGRES_PASSWORD
- upper and lowercase letters and numbers allowed
Provide the following values in the optimo-deployment.yaml
:
-
OPTIMO_BASIC_AUTH_PASSWORD
- upper and lowercase letters and numbers allowed
Provide the following values in the roi-config-configmap.yaml
:
-
PASSWORD
inDATABASE_URL
- has to match thePOSTGRES_PASSWORD
indb-config-configmap.yaml
. -
SECRET_KEY
,REPO_PRIVATE_KEY_SECRET
,STATS_JWT_KEY
, - upper and lowercase letters and numbers allowed. -
OPTIMO_BASIC_AUTH_PASSWORD
- has to matchOPTIMO_BASIC_AUTH_PASSWORD
inoptimo-deployment.yaml
-
URL_BASE
- The address the users will use to access your Valohai installation, e.g. https://mycompany.valohai.app.com
Note that there are various setup options for the roi-config-configmap.yaml
. Please discuss any specific requirements, such as login options or connecting an email server, with your Valohai contact.
Set up the Valohai Docker image
To run the Valohai application, you will need a Docker image, which your Valohai contact will provide. After downloading the image, push the image to a registry where your cluster has access. Make sure to update the image
in valohai-deployment.yaml
to point to the correct registry and image.
Which Docker images are required for deploying Valohai?
In addition to the Valohai application, you will have separate pods running for database (postgres
), job queue (redis
) and Valohai service for Bayesian optimization (optimo
). All these images are publicly available so you do not need to change anything in the YAML files.
Set up the namespace
By default, the resources here will be deployed to the default
namespace. In case you want to use another namespace make sure to add that to all the YAML files before applying them.
Set up the nodepool
Applying the nodepool.yaml
will create Kubernetes nodepool that can be used by Karpenter to scale up nodes. You can modify the configuration based on your needs. Note that for running the Valohai application and other related components, we recommend a node with at least 4 CPUs and 16 GB RAM (for example AWS m5.xlarge).
Define the subnets in ingress.yaml
You will need to provide the ids of at least two subnets for alb.ingress.kubernetes.io/subnets
in ingress.yaml
. These are used for the loadbalancer that will be set up for accessing the Valohai web UI.
Deploy the Valohai setup
You can deploy the Valohai setup by running kubectl apply -f .
. After this make sure that you have the deployments, services and pods available for valohai
, postgres
, redis
and optimo
.
kubectl get pods -n <namespace>
kubectl get deployments -n <namespace>
kubectl get services -n <namespace>
Set up the loadbalancer
Before adding the loadbalancer, you will need to set up the IAM policy and an IAM service account. Note that we are using eksctl
here for creating the service account. For more information refer to the AWS documentation.
curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.11.0/docs/install/iam_policy.json
aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file://iam_policy.json
eksctl create iamserviceaccount \
--cluster=<name-of-your-cluster> \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--role-name AmazonEKSLoadBalancerControllerRole \
--attach-policy-arn=arn:aws:iam::<your-account-id>:policy/AWSLoadBalancerControllerIAMPolicy \
--approve
We recommend using Helm for setting up the loadbalancer.
Start by adding the eks chart repository:
helm repo add eks https://aws.github.io/eks-charts
Next, run the following command to install the aws-load-balancer-controller
. Make sure to fill in the name of your cluster, AWS region and your VPC id.
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
--set 'clusterName=<name-of-your-cluster>' \
--set serviceAccount.create=false \
--set 'serviceAccount.name=aws-load-balancer-controller' \
--set 'region=<AWS-region>' --set 'vpcId=<your-vpc-id>' -n kube-system
Now, if you run kubectl get pods
you should see two aws-load-balancer-controller-<id>
pods under the kube-system
namespace. Once they are up and running, you can run kubectl get ingress
to get the address to access your Valohai installation.
Set up the Valohai workers
In order to set up the workers that will run your workloads, you can refer to the respective guides in this documentation:
- Kubernetes workers
- On-premises servers: Ubuntu installer or manual install
- Autoscaled EC2 instances
Note that regardless of the installation method, the workers need to be able to connect to the redis queue on port 6379 set up in your cluster during this installation.
Set up a data store
Valohai will require an S3 compatible data store. This can be for example a MinIO running on the cluster or an S3 bucket in your AWS. Discuss with your Valohai contact which option would best fit your needs.