EKS for Real-Time Inference

Configure your Amazon EKS cluster to deploy Valohai models for real-time inference.

Note: This guide is specifically for real-time inference deployments. It does not enable using Kubernetes for standard Valohai workers. For Kubernetes workers, see our installation guide.

Overview

Valohai can push deployments to an existing EKS cluster using standard Kubernetes APIs.

Requirements:

  • app.valohai.com (34.248.245.191, 63.34.156.112) must be able to access your cluster's API Server over HTTPS

  • Your cluster can be configured to serve only private deployment endpoints

Prerequisites

Existing infrastructure:

  • EKS cluster

  • kubectl configured to access your cluster

Tools:

  • kubectl installed

  • AWS CLI installed and configured

Step 1: Install ingress-nginx

Install the NGINX Ingress Controller on your cluster.

Follow the installation guide: https://kubernetes.github.io/ingress-nginx/deploy/

Get External IP

After installation, get the external IP of your ingress-nginx. You'll need to share this with Valohai.

kubectl -n ingress-nginx get service/ingress-nginx-controller

Step 2: Create Kubernetes Service Account

Create a service account that Valohai will use to manage deployments.

kubectl create serviceaccount valohai-deployment

Create Service Account Token

For Kubernetes 1.22 and higher, tokens are not created automatically:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: valohai-deployment-token
  namespace: <NAMESPACE HERE>
  annotations:
    kubernetes.io/service-account.name: valohai-deployment
EOF

Replace <NAMESPACE HERE> with your namespace (or use default).

Get the Token

Retrieve the token:

kubectl get serviceaccounts valohai-deployment -o json
kubectl get secret valohai-deployment-token -o jsonpath='{.data.token}' | base64 --decode

Save this token value to provide to Valohai.

Step 3: Create Kubernetes Role

Create a role that defines the permissions Valohai needs.

Create a file valohai-deployment-role.yml:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: valohai-deployment-role
rules:
  - apiGroups: [""]
    resources: ["events", "namespaces"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["pods", "pods/log", "services"]
    verbs: ["create", "delete", "deletecollection", "get", "list", "patch", "update", "watch"]
  - apiGroups: ["apps", "extensions"]
    resources: ["deployments", "deployments/rollback", "deployments/scale"]
    verbs: ["create", "delete", "deletecollection", "get", "list", "patch", "update", "watch"]
  - apiGroups: ["extensions", "networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["create", "delete", "deletecollection", "get", "list", "patch", "update", "watch"]

If you need to limit access to a certain namespace, add namespace: <NAMESPACE> under metadata.

Apply the role:

kubectl apply -f valohai-deployment-role.yml

Step 4: Create Role Binding

Bind the role to the service account.

Replace <namespace> with your namespace if you defined one when creating the service account:

kubectl create rolebinding valohai-deployment-binding \
    --role=valohai-deployment-role \
    --serviceaccount=<namespace>:valohai-deployment

Step 5: Configure Repository Access

Ensure your cluster's nodes can pull from the repository that Valohai pushes images to.

AWS IAM User for ECR

Create an IAM user that Valohai can use to access the cluster and push to your ECR.

1. Create IAM user

Navigate to IAM in AWS Console and create a user named valohai-eks-user.

  • Enable Programmatic access and Console access

2. Attach policies

Attach these existing policies:

  • AmazonEC2ContainerRegistryFullAccess

  • AmazonEKSServicePolicy

3. Create custom policy

Create a new policy named VH_EKS_USER:

{
    "Version": "2012-10-17",
    "Statement": [
      {
          "Sid": "1",
          "Effect": "Allow",
          "Action": "eks:ListClusters",
          "Resource": "*"
      }
  ]
}

4. Attach custom policy

Go back to the user, refresh, and attach the VH_EKS_USER policy.

5. Save credentials

Store the access key and secret in a safe place. You'll provide these to Valohai.

Alternative: Other Container Registries

You can use standard Docker login (username/password) credentials for:

  • Azure Container Registry

  • GitLab Container Registry

  • Artifactory

  • Docker Hub

  • Other registries

Create a separate account for Valohai to push to your repository.

Step 6: Collect Information

Gather the following information to send to Valohai:

Cluster details:

Find these on your cluster's page in EKS:

  • Cluster name: ____________

  • AWS region: ____________

  • API server endpoint: ____________

  • Cluster ARN: ____________

  • Certificate authority (cluster-certificate-data): ____________

Service account:

  • valohai-deployment service account token: ____________

Networking:

  • External IP of ingress-nginx: ____________

Get the external IP with:

kubectl get service/ingress-nginx-controller -n ingress-nginx

Optional - ALB:

If you have an ALB that has a well-trusted cert and points to the Kubernetes API, provide the ALB address instead of direct cluster API access.

Container registry:

ECR:

  • ECR name/URL: ____________

    • Example: accountid.dkr.ecr.eu-west-1.amazonaws.com

    • Find this when creating a new repository in ECR

IAM credentials:

  • valohai-eks-user access key ID: ____________

  • valohai-eks-user secret access key: ____________

Summary

Send the collected information to your Valohai contact at [email protected] using your organization's secure communication method.

Your Valohai contact will complete the configuration on the platform side.

Next Steps

After Valohai confirms the setup:

1. Test deployments

  • Create a deployment in Valohai

  • Verify it deploys to your EKS cluster

  • Check that the endpoint is accessible

2. Configure access controls

  • Review security groups for the load balancer

  • Configure authentication for endpoints if needed

  • Set up network policies in Kubernetes

3. Monitor deployments

  • Set up CloudWatch monitoring for your deployments

  • Configure alerts for deployment health

  • Review resource usage and costs

Troubleshooting

Cannot access cluster API

Verify API endpoint:

aws eks describe-cluster --name CLUSTER_NAME --query "cluster.endpoint"

Check security group:

Ensure the cluster security group allows access from Valohai IPs (34.248.245.191/32, 63.34.156.112/32).

Service account token invalid

Regenerate token:

Delete and recreate the secret:

kubectl delete secret valohai-deployment-token
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: valohai-deployment-token
  namespace: default
  annotations:
    kubernetes.io/service-account.name: valohai-deployment
EOF

Get new token:

kubectl get secret valohai-deployment-token -o jsonpath='{.data.token}' | base64 --decode

Ingress not working

Check ingress-nginx status:

kubectl get pods -n ingress-nginx

Check service:

kubectl get svc -n ingress-nginx

Verify load balancer:

Check in EC2 console that the load balancer is healthy and has healthy targets.

Cannot push to ECR

Test ECR access:

aws ecr get-login-password --region REGION | docker login --username AWS --password-stdin ACCOUNT.dkr.ecr.REGION.amazonaws.com

Verify IAM permissions:

Ensure valohai-eks-user has AmazonEC2ContainerRegistryFullAccess.

Deployments fail

Check deployment logs:

kubectl logs -l app=your-deployment -n default

Check pod events:

kubectl describe pod POD_NAME -n default

Common issues:

  • Image pull errors (check ECR permissions)

  • Resource limits too low

  • Network policies blocking traffic

  • Service ports misconfigured

Getting Help

Valohai Support: [email protected]

Include in support requests:

  • EKS cluster name and region

  • Kubernetes version

  • Service account token status

  • kubectl version

  • Error messages from deployments or Kubernetes events

  • Ingress and service configurations

Last updated

Was this helpful?