# Hybrid Deployment - Manual Setup

This guide walks through manually creating all AWS resources for a Valohai hybrid deployment. Use this if you can't use CloudFormation or Terraform, or need specific customizations.

> **Prefer automation?** Use [Terraform or CloudFormation](https://docs.valohai.com/installation-and-setup/aws/hybrid) for faster, repeatable deployments.

## Prerequisites

**From Valohai:**

* `valohai_assume_user` - ARN of the Valohai user (e.g., `arn:aws:iam::635691382966:user/valohai-customer-yourcompany`)
* `queue_address` - DNS name for your queue (e.g., `something.vqueue.net`)

**From your AWS account:**

* Admin access to AWS Console or CLI
* Region selected (consider GPU availability)
* EC2 key pair for SSH access

**Contact <support@valohai.com>** to receive your credentials before proceeding.

## Step 1: Configure IAM Roles

Create four IAM policies and roles that Valohai needs to manage resources.

### Create IAM Policies

Navigate to **AWS Console > IAM > Policies** and create these four policies.

**Important:** Replace placeholders before saving:

* `<AWS-ACCOUNT-ID>` with your 12-digit AWS account ID
* `valohai-data-<AWS-ACCOUNT-ID>` with your account ID (e.g., `valohai-data-123456789012`)

#### ValohaiQueuePolicy

Allows the queue instance to read secrets from Secrets Manager.

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "0",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetResourcePolicy",
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret",
        "secretsmanager:ListSecretVersionIds"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "secretsmanager:ResourceTag/valohai": "1"
        }
      }
    },
    {
      "Sid": "1",
      "Effect": "Allow",
      "Action": "secretsmanager:GetRandomPassword",
      "Resource": "*"
    }
  ]
}
```

#### ValohaiWorkerPolicy

Allows worker instances to describe themselves and protect from scaledown.

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "autoscaling:SetInstanceProtection",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "1"
    },
    {
      "Action": "ec2:DescribeInstances",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "2"
    }
  ]
}
```

#### ValohaiS3MultipartPolicy

Allows uploading large files (>5GB) to S3.

**Replace `<AWS-ACCOUNT-ID>` in both Resource ARNs.**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MultipartAccess",
      "Effect": "Allow",
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions",
        "s3:ListMultipartUploadParts",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>",
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>/*"
      ]
    }
  ]
}
```

#### ValohaiMasterPolicy

Allows Valohai to manage EC2 resources and access storage.

**Replace both `<AWS-ACCOUNT-ID>` placeholders.**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "2",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeVpcs",
        "ec2:DescribeKeyPairs",
        "ec2:DescribeImages",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:DescribeInstanceAttribute",
        "ec2:CreateTags",
        "ec2:DescribeInternetGateways",
        "ec2:DescribeRouteTables",
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeScalingActivities"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AllowUpdatingSpotLaunchTemplates",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateLaunchTemplate",
        "ec2:CreateLaunchTemplateVersion",
        "ec2:ModifyLaunchTemplate",
        "ec2:RunInstances",
        "ec2:TerminateInstances",
        "ec2:RebootInstances",
        "autoscaling:UpdateAutoScalingGroup",
        "autoscaling:CreateOrUpdateTags",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:CreateAutoScalingGroup"
      ],
      "Resource": "*",
      "Condition": {
        "ForAllValues:StringEquals": {
          "aws:ResourceTag/valohai": "1"
        }
      }
    },
    {
      "Sid": "ServiceLinkedRole",
      "Effect": "Allow",
      "Action": "iam:CreateServiceLinkedRole",
      "Resource": "arn:aws:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
    },
    {
      "Sid": "4",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole",
        "iam:GetRole"
      ],
      "Resource": "arn:aws:iam::<AWS-ACCOUNT-ID>:role/ValohaiWorkerRole"
    },
    {
      "Sid": "0",
      "Effect": "Allow",
      "Condition": {
        "StringEquals": {
          "secretsmanager:ResourceTag/valohai": "1"
        }
      },
      "Action": [
        "secretsmanager:GetResourcePolicy",
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret",
        "secretsmanager:ListSecretVersionIds"
      ],
      "Resource": "*"
    },
    {
      "Action": "secretsmanager:GetRandomPassword",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "1"
    },
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>",
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>/*"
      ]
    }
  ]
}
```

### Create IAM Roles

Navigate to **AWS Console > IAM > Roles** and create these four roles.

| Role Name                | Use Case            | Attach Policy              | Additional Configuration            |
| ------------------------ | ------------------- | -------------------------- | ----------------------------------- |
| `ValohaiQueueRole`       | EC2                 | `ValohaiQueuePolicy`       | Instance profile auto-created       |
| `ValohaiWorkerRole`      | EC2                 | `ValohaiWorkerPolicy`      | Instance profile auto-created       |
| `ValohaiS3MultipartRole` | Another AWS Account | `ValohaiS3MultipartPolicy` | Account ID: `<YOUR-AWS-ACCOUNT-ID>` |
| `ValohaiMaster`          | Another AWS Account | `ValohaiMasterPolicy`      | Account ID: `635691382966`          |

**For ValohaiMaster role:**

After creation, verify the trust relationship. It should look like this:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::635691382966:user/valohai-customer-<YOUR-IDENTIFIER>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

The ARN here matches the `valohai_assume_user` provided by Valohai.

**Important:** `ValohaiQueueRole` and `ValohaiWorkerRole` need instance profiles. These are automatically created when using the Console. If using CLI, create them with:

```shell
aws iam create-instance-profile --instance-profile-name ValohaiQueueRole
aws iam add-role-to-instance-profile --instance-profile-name ValohaiQueueRole --role-name ValohaiQueueRole

aws iam create-instance-profile --instance-profile-name ValohaiWorkerRole
aws iam add-role-to-instance-profile --instance-profile-name ValohaiWorkerRole --role-name ValohaiWorkerRole
```

## Step 2: Create VPC and Networking

### Create VPC

Navigate to **AWS Console > VPC > Your VPCs > Create VPC**.

**Settings:**

* Name: `valohai-vpc`
* IPv4 CIDR: `10.0.0.0/16`
* No IPv6 CIDR
* Tenancy: Default
* Tags: Add `Key=valohai, Value=1`

### Create Subnets

Create one subnet per availability zone in your region.

Navigate to **VPC > Subnets > Create subnet**.

**Example for a region with 4 zones:**

| Name               | Availability Zone | IPv4 CIDR      |
| ------------------ | ----------------- | -------------- |
| `valohai-subnet-1` | `<region>a`       | `10.0.0.0/20`  |
| `valohai-subnet-2` | `<region>b`       | `10.0.16.0/20` |
| `valohai-subnet-3` | `<region>c`       | `10.0.32.0/20` |
| `valohai-subnet-4` | `<region>d`       | `10.0.48.0/20` |

Add `Key=valohai, Value=1` tag to each subnet.

### Create Internet Gateway

Navigate to **VPC > Internet Gateways > Create internet gateway**.

**Settings:**

* Name: `valohai-igw`
* Tags: Add `Key=valohai, Value=1`

After creation, attach it to `valohai-vpc`:

* Actions > Attach to VPC > Select `valohai-vpc`

### Configure Route Table

Navigate to **VPC > Route Tables**.

Find the main route table for `valohai-vpc` and rename it to `valohai-rt`.

Edit routes to add:

| Destination   | Target        |
| ------------- | ------------- |
| `10.0.0.0/16` | local         |
| `0.0.0.0/0`   | `valohai-igw` |

Add `Key=valohai, Value=1` tag to the route table.

## Step 3: Create Security Groups

### valohai-sg-workers

Navigate to **EC2 > Security Groups > Create security group**.

**Basic details:**

* Name: `valohai-sg-workers`
* Description: Security group for Valohai worker instances
* VPC: `valohai-vpc`

**Inbound rules:**

* None by default (add SSH from your IP if needed for debugging)

**Outbound rules:**

* Type: All traffic
* Destination: `0.0.0.0/0`

**Tags:** Add `Key=valohai, Value=1`

### valohai-sg-queue

**Basic details:**

* Name: `valohai-sg-queue`
* Description: Security group for Valohai queue instance
* VPC: `valohai-vpc`

**Inbound rules:**

| Type       | Protocol | Port  | Source               | Description                        |
| ---------- | -------- | ----- | -------------------- | ---------------------------------- |
| Custom TCP | TCP      | 80    | `0.0.0.0/0`          | Let's Encrypt HTTP challenge       |
| Custom TCP | TCP      | 63790 | `34.248.245.191/32`  | Redis from app.valohai.com         |
| Custom TCP | TCP      | 63790 | `63.34.156.112/32`   | Redis from Valohai scaling service |
| Custom TCP | TCP      | 63790 | `valohai-sg-workers` | Redis from workers                 |

**Outbound rules:**

* Type: All traffic
* Destination: `0.0.0.0/0`

**Tags:** Add `Key=valohai, Value=1`

## Step 4: Create Secrets Manager Secret

Navigate to **AWS Console > Secrets Manager > Store a new secret**.

**Secret type:**

* Other type of secret

**Key/value:**

* Plaintext tab
* Generate a strong password with uppercase, lowercase letters, and numbers (no special characters)

**Secret name:** `valohai_redis_server`

**Tags:** Add `Key=valohai, Value=1`

**Rotation:** Disable automatic rotation

Save the secret and note the password for the next step.

## Step 5: Allocate Elastic IP

Navigate to **EC2 > Elastic IPs > Allocate Elastic IP address**.

**Settings:**

* Network Border Group: Default
* Public IPv4 address pool: Amazon's pool of IPv4 addresses
* Tags: `Name=valohai-ip-queue`, `Key=valohai, Value=1`

Allocate the IP. You'll associate it with the queue instance in the next step.

## Step 6: Create Queue Instance

Navigate to **EC2 > Instances > Launch instance**.

### Basic Configuration

**Name:** `valohai-queue`

**Application and OS Images:**

* Ubuntu Server 24.04 LTS
* Architecture: 64-bit (x86)

**Instance type:** `t3.medium`

**Key pair:** Select your existing key pair

### Network Settings

**VPC:** `valohai-vpc`

**Subnet:** Any subnet (e.g., `valohai-subnet-1`)

**Auto-assign public IP:** Disable (we'll use Elastic IP)

**Firewall (security groups):** Select `valohai-sg-queue`

### Storage

**Root volume:**

* Size: 32 GiB
* Volume type: gp3 (General Purpose SSD)
* Encrypted: Optional (recommended)

### Advanced Details

**IAM instance profile:** `ValohaiQueueRole`

**User data:** Paste this script, **replacing `<queue_address>` with your actual queue address** from Valohai:

```bash
#!/bin/bash
sudo apt-get update && sudo apt-get install awscli -y
export TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
export REGION=`curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/placement/region/`
export PASSWORD=`aws secretsmanager get-secret-value --secret-id valohai_redis_server --region $REGION | sed -n 's|.*"SecretString": *"\([^"]*\)".*|\1|p'`
export QUEUE=<queue_address>
curl https://raw.githubusercontent.com/valohai/worker-queue/main/host/setup.sh | sudo QUEUE_ADDRESS=$QUEUE REDIS_PASSWORD=$PASSWORD bash
unset PASSWORD
```

**Tags:** Add `Key=valohai, Value=1`

Launch the instance.

### Associate Elastic IP

After the instance is running:

1. Navigate to **EC2 > Elastic IPs**
2. Select your `valohai-ip-queue`
3. Actions > Associate Elastic IP address
4. Select the `valohai-queue` instance
5. Associate

## Step 7: Create S3 Bucket

Navigate to **S3 > Create bucket**.

**Bucket name:** `valohai-data-<AWS-ACCOUNT-ID>`

* Replace `<AWS-ACCOUNT-ID>` with your 12-digit account ID
* Example: `valohai-data-123456789012`

**Region:** Same as your EC2 instances

**Block Public Access:** Enable (all checkboxes)

**Bucket Versioning:** Disabled

**Tags:** Add `Key=valohai, Value=1`

**Encryption:** Enable (AWS managed keys)

Create the bucket.

### Configure CORS

After creating the bucket:

1. Open the bucket
2. Navigate to **Permissions** tab
3. Scroll to **Cross-origin resource sharing (CORS)**
4. Click Edit and paste:

```json
[
  {
    "AllowedHeaders": [
      "Authorization"
    ],
    "AllowedMethods": [
      "GET"
    ],
    "AllowedOrigins": [
      "*"
    ],
    "ExposeHeaders": [],
    "MaxAgeSeconds": 3000
  },
  {
    "AllowedHeaders": [
      "Authorization"
    ],
    "AllowedMethods": [
      "POST"
    ],
    "AllowedOrigins": [
      "https://app.valohai.com"
    ],
    "ExposeHeaders": [],
    "MaxAgeSeconds": 3000
  }
]
```

Save changes.

## Step 8: Verify and Share Information

### Collect Required Information

Gather these values to send to Valohai:

**Subscription and Location:**

* AWS Account ID: `____________`
* Region: `____________`

**IAM:**

* ValohaiMaster Role ARN: `arn:aws:iam::<account-id>:role/ValohaiMaster`

**Networking:**

* VPC ID: `vpc-____________`
* Subnet IDs: `subnet-________, subnet-________, ...`

**Queue Instance:**

* Private IP: `____________`
* Public IP (Elastic IP): `____________`

**Storage:**

* S3 Bucket Name: `valohai-data-<account-id>`

### Send to Valohai

Email this information to your Valohai contact at **<support@valohai.com>** using your organization's secure communication method (e.g., password-protected document, encrypted email).

### Verify Queue Instance

SSH into the queue instance to verify it's running correctly:

```shell
ssh -i your-key.pem ubuntu@<elastic-ip>
sudo systemctl status valohai-queue
```

You should see the service is active and running. Check logs with:

```shell
sudo journalctl -u valohai-queue -f
```

## Troubleshooting

### Queue instance not accessible

**Check security group rules:**

```shell
aws ec2 describe-security-groups --group-ids <sg-id>
```

Verify port 63790 is open from `34.248.245.191/32` and `63.34.156.112/32`.

**Check instance status:**

```shell
aws ec2 describe-instance-status --instance-ids <instance-id>
```

**Verify user data script ran:**

```shell
ssh -i your-key.pem ubuntu@<elastic-ip>
cat /var/log/cloud-init-output.log
```

### IAM role issues

**Verify trust relationship:**

```shell
aws iam get-role --role-name ValohaiMaster
```

Check the `AssumeRolePolicyDocument` contains the correct Valohai ARN.

**Check instance profile:**

```shell
aws iam get-instance-profile --instance-profile-name ValohaiWorkerRole
```

### S3 access errors

**Verify bucket policy and CORS:**

```shell
aws s3api get-bucket-cors --bucket valohai-data-<account-id>
```

**Test worker access:**

```shell
# From a worker instance
aws s3 ls s3://valohai-data-<account-id>/
```

### Cannot reach queue from workers

**Check network connectivity:**

```shell
# From a worker instance
telnet <queue-private-ip> 63790
```

**Verify security group:**

* Ensure `valohai-sg-queue` allows inbound from `valohai-sg-workers`

## Getting Help

**Before contacting support**, collect:

* CloudTrail logs for any permission errors
* Security group configurations
* IAM role trust relationships
* Queue instance logs (`/var/log/cloud-init-output.log`)

**Contact:** <support@valohai.com>

**Include in your message:**

* AWS Account ID
* Region
* Specific error messages
* Steps you've already tried
