# Kubernetes Autoscaling

Configure autoscaling for your Kubernetes cluster to dynamically provision resources for Valohai ML workloads.

## Overview

Valohai Workers on Kubernetes are implemented as Kubernetes jobs. When you have autoscaling configured, your cluster can automatically:

* Scale up nodes when Valohai jobs are queued
* Scale down nodes when jobs complete and resources are idle
* Select appropriate instance types based on job requirements
* Optimize costs by using spot/preemptible instances

> **Note:** This guide uses AWS EKS with Karpenter as an example, but the concepts apply to any Kubernetes cluster. The same principles work with GKE, AKS, or on-premises Kubernetes using different autoscalers.

## Autoscaling Options

You can use various autoscaling solutions with Valohai:

### Karpenter (Recommended for AWS EKS)

**Best for:** AWS EKS clusters

**Advantages:**

* Fast node provisioning (seconds vs. minutes)
* Flexible instance selection
* Bin-packing optimization
* Direct EC2 API integration

**Cloud support:** AWS (native), Azure and GCP (experimental)

### Cluster Autoscaler

**Best for:** Multi-cloud environments, stable workloads

**Advantages:**

* Cloud-agnostic
* Mature and widely used
* Works with all major cloud providers
* Simple configuration

**Cloud support:** AWS, GCP, Azure, and others

### Cloud-Native Autoscalers

**GKE Autopilot:** Fully managed node provisioning on GKE

**AKS Cluster Autoscaler:** Azure's native autoscaling

**Best for:** Organizations standardized on one cloud provider

## Example: Karpenter on AWS EKS

This section provides a complete example of setting up Karpenter on AWS EKS. If you're using a different cloud provider or autoscaler, adapt these concepts to your environment.

### Requirements

**Existing infrastructure:**

* EKS cluster with Valohai workers installed
* [AWS CLI](https://aws.amazon.com/cli/) installed
* [kubectl](https://kubernetes.io/docs/reference/kubectl/) configured

**Permissions:**

* Admin access to your EKS cluster
* IAM permissions to create roles and policies

### Step 1: Set Up Environment Variables

Define common variables for reuse:

```bash
# Check if OIDC is configured
aws iam list-open-id-connect-providers
# Should show: oidc.eks.<region>.amazonaws.com/id/<ID>

export AWS_PROFILE=<aws-profile>
export AWS_REGION=<region>
export KUBECONFIG=~/.kube/<cluster-name>

CLUSTER=<cluster-name>
KARPENTER_NAMESPACE=kube-system
AWS_PARTITION="aws"
OIDC_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER} --query "cluster.identity.oidc.issuer" --output text)"
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
```

### Step 2: Create IAM Roles

Create two IAM roles: one for nodes provisioned by Karpenter and one for the Karpenter controller.

**Create node trust policy:**

```bash
echo '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "ec2.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}' > node-trust-policy.json
```

**Create node role:**

```bash
aws iam create-role \
  --role-name "KarpenterNodeRole-${CLUSTER}" \
  --assume-role-policy-document file://node-trust-policy.json

aws iam attach-role-policy \
  --role-name "KarpenterNodeRole-${CLUSTER}" \
  --policy-arn arn:${AWS_PARTITION}:iam::aws:policy/AmazonEKSWorkerNodePolicy

aws iam attach-role-policy \
  --role-name "KarpenterNodeRole-${CLUSTER}" \
  --policy-arn arn:${AWS_PARTITION}:iam::aws:policy/AmazonEKS_CNI_Policy

aws iam attach-role-policy \
  --role-name "KarpenterNodeRole-${CLUSTER}" \
  --policy-arn arn:${AWS_PARTITION}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

aws iam attach-role-policy \
  --role-name "KarpenterNodeRole-${CLUSTER}" \
  --policy-arn arn:${AWS_PARTITION}:iam::aws:policy/AmazonSSMManagedInstanceCore
```

**Create controller trust policy:**

```bash
cat << EOF > controller-trust-policy.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT#*//}"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "${OIDC_ENDPOINT#*//}:aud": "sts.amazonaws.com",
                    "${OIDC_ENDPOINT#*//}:sub": "system:serviceaccount:${KARPENTER_NAMESPACE}:karpenter"
                }
            }
        }
    ]
}
EOF
```

**Create controller role:**

```bash
aws iam create-role \
  --role-name KarpenterControllerRole-${CLUSTER} \
  --assume-role-policy-document file://controller-trust-policy.json
```

**Create controller policy:**

```bash
cat << EOF > controller-policy.json
{
    "Statement": [
        {
            "Action": [
                "ssm:GetParameter",
                "ec2:DescribeImages",
                "ec2:RunInstances",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceTypeOfferings",
                "ec2:DescribeAvailabilityZones",
                "ec2:DeleteLaunchTemplate",
                "ec2:CreateTags",
                "ec2:CreateLaunchTemplate",
                "ec2:CreateFleet",
                "ec2:DescribeSpotPriceHistory",
                "pricing:GetProducts"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "Karpenter"
        },
        {
            "Action": "ec2:TerminateInstances",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/karpenter.sh/nodepool": "*"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "ConditionalEC2Termination"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER}",
            "Sid": "PassNodeIAMRole"
        },
        {
            "Effect": "Allow",
            "Action": "eks:DescribeCluster",
            "Resource": "arn:${AWS_PARTITION}:eks:${AWS_REGION}:${AWS_ACCOUNT_ID}:cluster/${CLUSTER}",
            "Sid": "EKSClusterEndpointLookup"
        },
        {
            "Sid": "AllowScopedInstanceProfileCreationActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:CreateInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:RequestTag/kubernetes.io/cluster/${CLUSTER}": "owned",
                "aws:RequestTag/topology.kubernetes.io/region": "${AWS_REGION}"
            },
            "StringLike": {
                "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowScopedInstanceProfileTagActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:TagInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:ResourceTag/kubernetes.io/cluster/${CLUSTER}": "owned",
                "aws:ResourceTag/topology.kubernetes.io/region": "${AWS_REGION}",
                "aws:RequestTag/kubernetes.io/cluster/${CLUSTER}": "owned",
                "aws:RequestTag/topology.kubernetes.io/region": "${AWS_REGION}"
            },
            "StringLike": {
                "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*",
                "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowScopedInstanceProfileActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
            "iam:AddRoleToInstanceProfile",
            "iam:RemoveRoleFromInstanceProfile",
            "iam:DeleteInstanceProfile"
            ],
            "Condition": {
            "StringEquals": {
                "aws:ResourceTag/kubernetes.io/cluster/${CLUSTER}": "owned",
                "aws:ResourceTag/topology.kubernetes.io/region": "${AWS_REGION}"
            },
            "StringLike": {
                "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
            }
            }
        },
        {
            "Sid": "AllowInstanceProfileReadActions",
            "Effect": "Allow",
            "Resource": "*",
            "Action": "iam:GetInstanceProfile"
        }
    ],
    "Version": "2012-10-17"
}
EOF
```

**Attach policy to role:**

```bash
aws iam put-role-policy \
  --role-name KarpenterControllerRole-${CLUSTER} \
  --policy-name KarpenterControllerPolicy-${CLUSTER} \
  --policy-document file://controller-policy.json
```

### Step 3: Tag Resources

Tag node group subnets and security groups so Karpenter knows which resources to use:

**Tag subnets:**

```bash
for NODEGROUP in $(aws eks list-nodegroups --cluster-name ${CLUSTER} \
    --query 'nodegroups' --output text); do
    aws ec2 create-tags \
        --tags "Key=karpenter.sh/discovery,Value=${CLUSTER}" \
        --resources $(aws eks describe-nodegroup --cluster-name ${CLUSTER} \
        --nodegroup-name $NODEGROUP --query 'nodegroup.subnets' --output text)
done
```

**Tag security group:**

```bash
NODEGROUP=$(aws eks list-nodegroups --cluster-name ${CLUSTER} --query 'nodegroups[0]' --output text)

SECURITY_GROUPS=$(aws eks describe-cluster \
  --name ${CLUSTER} \
  --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" \
  --output text)

aws ec2 create-tags \
    --tags "Key=karpenter.sh/discovery,Value=${CLUSTER}" \
    --resources ${SECURITY_GROUPS}
```

### Step 4: Update aws-auth ConfigMap

Allow nodes with the KarpenterNodeRole to join the cluster:

```bash
cat << EOF
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER}
      username: system:node:{{EC2PrivateDNSName}}
EOF
```

Add the output to the mapRoles in the aws-auth ConfigMap:

```bash
kubectl edit configmap aws-auth -n kube-system
```

### Step 5: Deploy Karpenter

**Set Karpenter version:**

```bash
export KARPENTER_VERSION=v0.33.1
```

**Generate Karpenter manifests:**

```bash
helm template karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "${KARPENTER_VERSION}" \
  --namespace "${KARPENTER_NAMESPACE}" \
  --set "settings.clusterName=${CLUSTER}" \
  --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi > karpenter.yaml
```

**Modify affinity rules:**

Edit `karpenter.yaml` to tell Karpenter to run on existing node group nodes:

```yaml
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: karpenter.sh/nodepool
          operator: DoesNotExist
      - matchExpressions:
        - key: eks.amazonaws.com/nodegroup
          operator: In
          values:
          - ${NODEGROUP}
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - topologyKey: kubernetes.io/hostname
```

**Deploy Karpenter CRDs:**

```bash
kubectl create -f \
    https://raw.githubusercontent.com/aws/karpenter-provider-aws/${KARPENTER_VERSION}/pkg/apis/crds/karpenter.sh_nodepools.yaml
kubectl create -f \
    https://raw.githubusercontent.com/aws/karpenter-provider-aws/${KARPENTER_VERSION}/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml
kubectl create -f \
    https://raw.githubusercontent.com/aws/karpenter-provider-aws/${KARPENTER_VERSION}/pkg/apis/crds/karpenter.sh_nodeclaims.yaml
```

**Deploy Karpenter:**

```bash
kubectl apply -f karpenter.yaml
```

### Step 6: Create Node Pools

Create node pools for different workload types.

**CPU Node Pool:**

```bash
cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        name: default
  limits:
    cpu: 100
    memory: 1000Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: "KarpenterNodeRole-${CLUSTER}"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER}"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER}"
EOF
```

**GPU Node Pool (Optional):**

If using GPUs, install the NVIDIA device plugin first:

```bash
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

# Check version and use it in the command below
helm search repo nvdp --devel

helm upgrade --install nvdp nvdp/nvidia-device-plugin \
  --namespace nvidia-device-plugin \
  --create-namespace \
  --version <version>
```

Create GPU node pool:

```bash
cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default-gpu
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["p"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        name: default
      taints:
      - key: nvidia.com/gpu
        value: true
        effect: "NoSchedule"
  limits:
    cpu: 100
    memory: 1000Gi
    nvidia.com/gpu: 5
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h
EOF
```

### Step 7: Monitor Scaling

Follow Karpenter logs to see scaling activity:

```bash
kubectl logs -f -n ${KARPENTER_NAMESPACE} -c controller -l app.kubernetes.io/name=karpenter
```

**Test scaling:**

Create a Valohai execution and watch Karpenter provision nodes automatically.

## Adapting to Other Environments

The concepts above apply to other Kubernetes environments. Here's how to adapt:

### Google Cloud (GKE)

**Use GKE Cluster Autoscaler:**

```bash
gcloud container clusters update CLUSTER_NAME \
  --enable-autoscaling \
  --min-nodes=1 \
  --max-nodes=10 \
  --node-pool=default-pool
```

**Or use GKE Autopilot** for fully managed node provisioning.

### Azure (AKS)

**Use AKS Cluster Autoscaler:**

```bash
az aks update \
  --resource-group RESOURCE_GROUP \
  --name CLUSTER_NAME \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 10
```

### On-Premises or Custom Kubernetes

**Use Kubernetes Cluster Autoscaler:**

Install Cluster Autoscaler following [Kubernetes documentation](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler).

Configure it to work with your infrastructure provider (vSphere, OpenStack, etc.).

## Best Practices

### Node Pool Configuration

**Separate pools for different workloads:**

* CPU-intensive: `c` instance family
* Memory-intensive: `r` instance family
* GPU workloads: `p` or `g` instance family

**Cost optimization:**

* Use spot/preemptible instances for interruptible workloads
* Set appropriate limits to prevent runaway costs
* Configure consolidation for efficient resource usage

### Resource Requests

**Set accurate requests in Valohai:**

* CPU and memory requests help autoscaler make better decisions
* Over-requesting wastes resources
* Under-requesting causes scheduling failures

### Scaling Parameters

**Balance speed and cost:**

* Fast scale-up for time-sensitive workloads
* Gradual scale-down to avoid thrashing
* Appropriate consolidation policies

## Troubleshooting

### Nodes not scaling up

**Check Karpenter logs:**

```bash
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter
```

**Common issues:**

* IAM permissions insufficient
* No matching node pool for job requirements
* Instance type not available in region
* Subnet or security group not tagged

### Nodes not scaling down

**Check disruption settings:**

* Verify consolidation policy
* Check if nodes have workloads preventing disruption
* Review expiration settings

**Force disruption (careful):**

```bash
kubectl delete node NODE_NAME
```

### Jobs stuck pending

**Describe the pod:**

```bash
kubectl describe pod POD_NAME -n valohai-workers
```

**Check events:**

```bash
kubectl get events -n valohai-workers --sort-by='.lastTimestamp'
```

**Common issues:**

* Resource requests too large
* No node pool matches requirements
* Taints preventing scheduling

## Getting Help

**Valohai Support:** <support@valohai.com>

**Include in support requests:**

* Kubernetes version
* Autoscaler type and version
* Node pool configurations
* Pod descriptions and events
* Autoscaler logs

**For Karpenter-specific issues:**

* Karpenter logs
* NodePool and EC2NodeClass definitions
* AWS IAM role configuration
