EKS for Real-Time Inference

Configure your Amazon EKS cluster to deploy Valohai models for real-time inference.

Note: This guide is specifically for real-time inference deployments. It does not enable using Kubernetes for standard Valohai workers. For Kubernetes workers, see our installation guide.

Overview

Valohai can push deployments to an existing EKS cluster using standard Kubernetes APIs.

Requirements:

  • app.valohai.com (34.248.245.191, 63.34.156.112) must be able to access your cluster's API Server over HTTPS

  • Your cluster can be configured to serve only private deployment endpoints

Prerequisites

Existing infrastructure:

  • EKS cluster

  • kubectl configured to access your cluster

Tools:

  • kubectl installed

  • AWS CLI installed and configured

Step 1: Install Gateway API CRDs and Gateway controller

  • Install Gateway API CRDs — Follow the Getting Started guidearrow-up-right on the official Gateway API documentation.

  • Install a Gateway controller — Choose and install a controller from the official implementations listarrow-up-right.

  • Create a Gateway — Configure a Gateway for Valohai Deployments to use. This will serve as the base URL / domain for all Valohai Deployment Endpoints, including TLS termination for https://. Refer to your chosen controller's documentation for the exact configuration. Note down the Gateway name and namespace for later.

  • Grant permissions to manage HTTP routes — See the httproutes resource in Step 3.

Note down the Gateway name, namespace, and whether it uses https:// or http:// — you'll need these in Step 6.

Step 2: Create Kubernetes Service Account

Create a service account that Valohai will use to manage deployments.

Create Service Account Token

For Kubernetes 1.22 and higher, tokens are not created automatically:

Replace <NAMESPACE HERE> with your namespace (or use default).

Get the Token

Retrieve the token:

Save this token value to provide to Valohai.

Step 3: Create Kubernetes Role

Create a role that defines the permissions Valohai needs.

Create a file valohai-deployment-role.yml:

If you need to limit access to a certain namespace, add namespace: <NAMESPACE> under metadata.

Apply the role:

Step 4: Create Role Binding

Bind the role to the service account.

Replace <namespace> with your namespace if you defined one when creating the service account:

Step 5: Configure Repository Access

Ensure your cluster's nodes can pull from the repository that Valohai pushes images to.

AWS IAM User for ECR

Create an IAM user that Valohai can use to access the cluster and push to your ECR.

1. Create IAM user

Navigate to IAM in AWS Console and create a user named valohai-eks-user.

  • Enable Programmatic access and Console access

2. Attach policies

Attach these existing policies:

  • AmazonEC2ContainerRegistryFullAccess

  • AmazonEKSServicePolicy

3. Create custom policy

Create a new policy named VH_EKS_USER:

4. Attach custom policy

Go back to the user, refresh, and attach the VH_EKS_USER policy.

5. Save credentials

Store the access key and secret in a safe place. You'll provide these to Valohai.

Alternative: Other Container Registries

You can use standard Docker login (username/password) credentials for:

  • Azure Container Registry

  • GitLab Container Registry

  • Artifactory

  • Docker Hub

  • Other registries

Create a separate account for Valohai to push to your repository.

Step 6: Collect Information

Gather the following information to send to Valohai:

Cluster details:

Find these on your cluster's page in EKS:

  • Cluster name: ____________

  • AWS region: ____________

  • API server endpoint: ____________

  • Cluster ARN: ____________

  • Certificate authority (cluster-certificate-data): ____________

Service account:

  • valohai-deployment service account token: ____________

Networking:

  • Gateway name: ____________

  • Gateway namespace: ____________

  • Base URL scheme (https:// or http://): ____________

Container registry:

ECR:

  • ECR name/URL: ____________

    • Example: accountid.dkr.ecr.eu-west-1.amazonaws.com

    • Find this when creating a new repository in ECR

IAM credentials:

  • valohai-eks-user access key ID: ____________

  • valohai-eks-user secret access key: ____________

Send the collected information to your Valohai contact at [email protected] using your organization's secure communication method.

Your Valohai contact will complete the configuration on the platform side.

Next Steps

After Valohai confirms the setup:

1. Test deployments

  • Create a deployment in Valohai

  • Verify it deploys to your EKS cluster

  • Check that the endpoint is accessible

2. Configure access controls

  • Review security groups for the load balancer

  • Configure authentication for endpoints if needed

  • Set up network policies in Kubernetes

3. Monitor deployments

  • Set up CloudWatch monitoring for your deployments

  • Configure alerts for deployment health

  • Review resource usage and costs

Troubleshooting

Cannot access cluster API

Verify API endpoint:

Check security group:

Ensure the cluster security group allows access from Valohai IPs (34.248.245.191/32, 63.34.156.112/32).

Service account token invalid

Regenerate token:

Delete and recreate the secret:

Get new token:

HTTPRoutes not working

Check that HTTPRoutes were created:

Check Gateway status:

Verify the Gateway controller is running by checking the controller's pods according to your chosen controller's documentation.

Cannot push to ECR

Test ECR access:

Verify IAM permissions:

Ensure valohai-eks-user has AmazonEC2ContainerRegistryFullAccess.

Deployments fail

Check deployment logs:

Check pod events:

Common issues:

  • Image pull errors (check ECR permissions)

  • Resource limits too low

  • Network policies blocking traffic

  • Service ports misconfigured

Getting Help

Valohai Support: [email protected]

Include in support requests:

  • EKS cluster name and region

  • Kubernetes version

  • Service account token status

  • kubectl version

  • Error messages from deployments or Kubernetes events

  • Gateway and HTTPRoute configurations

Last updated

Was this helpful?