# Hybrid Deployment - Manual Setup

This guide walks through manually creating all AWS resources for a Valohai hybrid deployment. Use this if you can't use CloudFormation or Terraform, or need specific customizations.

> **Prefer automation?** Use [Terraform or CloudFormation](/installation-and-setup/aws/hybrid.md) for faster, repeatable deployments.

## Prerequisites

**From Valohai:**

* `valohai_assume_user` - ARN of the Valohai user (e.g., `arn:aws:iam::635691382966:user/valohai-customer-yourcompany`)
* `queue_address` - DNS name for your queue (e.g., `something.vqueue.net`)

**From your AWS account:**

* Admin access to AWS Console or CLI
* Region selected (consider GPU availability)
* EC2 key pair for SSH access

**Contact <support@valohai.com>** to receive your credentials before proceeding.

## Step 1: Configure IAM Roles

Create four IAM policies and roles that Valohai needs to manage resources.

### Create IAM Policies

Navigate to **AWS Console > IAM > Policies** and create these four policies.

**Important:** Replace placeholders before saving:

* `<AWS-ACCOUNT-ID>` with your 12-digit AWS account ID
* `valohai-data-<AWS-ACCOUNT-ID>` with your account ID (e.g., `valohai-data-123456789012`)

#### ValohaiQueuePolicy

Allows the queue instance to read secrets from Secrets Manager.

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "0",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetResourcePolicy",
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret",
        "secretsmanager:ListSecretVersionIds"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "secretsmanager:ResourceTag/valohai": "1"
        }
      }
    },
    {
      "Sid": "1",
      "Effect": "Allow",
      "Action": "secretsmanager:GetRandomPassword",
      "Resource": "*"
    }
  ]
}
```

#### ValohaiWorkerPolicy

Allows worker instances to describe themselves and protect from scaledown.

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "autoscaling:SetInstanceProtection",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "1"
    },
    {
      "Action": "ec2:DescribeInstances",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "2"
    }
  ]
}
```

#### ValohaiS3MultipartPolicy

Allows uploading large files (>5GB) to S3.

**Replace `<AWS-ACCOUNT-ID>` in both Resource ARNs.**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MultipartAccess",
      "Effect": "Allow",
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions",
        "s3:ListMultipartUploadParts",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>",
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>/*"
      ]
    }
  ]
}
```

#### ValohaiMasterPolicy

Allows Valohai to manage EC2 resources and access storage.

**Replace both `<AWS-ACCOUNT-ID>` placeholders.**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "2",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeVpcs",
        "ec2:DescribeKeyPairs",
        "ec2:DescribeImages",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:DescribeInstanceAttribute",
        "ec2:CreateTags",
        "ec2:DescribeInternetGateways",
        "ec2:DescribeRouteTables",
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeScalingActivities"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AllowUpdatingSpotLaunchTemplates",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateLaunchTemplate",
        "ec2:CreateLaunchTemplateVersion",
        "ec2:ModifyLaunchTemplate",
        "ec2:RunInstances",
        "ec2:TerminateInstances",
        "ec2:RebootInstances",
        "autoscaling:UpdateAutoScalingGroup",
        "autoscaling:CreateOrUpdateTags",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:CreateAutoScalingGroup"
      ],
      "Resource": "*",
      "Condition": {
        "ForAllValues:StringEquals": {
          "aws:ResourceTag/valohai": "1"
        }
      }
    },
    {
      "Sid": "ServiceLinkedRole",
      "Effect": "Allow",
      "Action": "iam:CreateServiceLinkedRole",
      "Resource": "arn:aws:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
    },
    {
      "Sid": "4",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole",
        "iam:GetRole"
      ],
      "Resource": "arn:aws:iam::<AWS-ACCOUNT-ID>:role/ValohaiWorkerRole"
    },
    {
      "Sid": "0",
      "Effect": "Allow",
      "Condition": {
        "StringEquals": {
          "secretsmanager:ResourceTag/valohai": "1"
        }
      },
      "Action": [
        "secretsmanager:GetResourcePolicy",
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret",
        "secretsmanager:ListSecretVersionIds"
      ],
      "Resource": "*"
    },
    {
      "Action": "secretsmanager:GetRandomPassword",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "1"
    },
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>",
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>/*"
      ]
    }
  ]
}
```

### Create IAM Roles

Navigate to **AWS Console > IAM > Roles** and create these four roles.

| Role Name                | Use Case            | Attach Policy              | Additional Configuration            |
| ------------------------ | ------------------- | -------------------------- | ----------------------------------- |
| `ValohaiQueueRole`       | EC2                 | `ValohaiQueuePolicy`       | Instance profile auto-created       |
| `ValohaiWorkerRole`      | EC2                 | `ValohaiWorkerPolicy`      | Instance profile auto-created       |
| `ValohaiS3MultipartRole` | Another AWS Account | `ValohaiS3MultipartPolicy` | Account ID: `<YOUR-AWS-ACCOUNT-ID>` |
| `ValohaiMaster`          | Another AWS Account | `ValohaiMasterPolicy`      | Account ID: `635691382966`          |

**For ValohaiMaster role:**

After creation, verify the trust relationship. It should look like this:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::635691382966:user/valohai-customer-<YOUR-IDENTIFIER>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

The ARN here matches the `valohai_assume_user` provided by Valohai.

**Important:** `ValohaiQueueRole` and `ValohaiWorkerRole` need instance profiles. These are automatically created when using the Console. If using CLI, create them with:

```shell
aws iam create-instance-profile --instance-profile-name ValohaiQueueRole
aws iam add-role-to-instance-profile --instance-profile-name ValohaiQueueRole --role-name ValohaiQueueRole

aws iam create-instance-profile --instance-profile-name ValohaiWorkerRole
aws iam add-role-to-instance-profile --instance-profile-name ValohaiWorkerRole --role-name ValohaiWorkerRole
```

## Step 2: Create VPC and Networking

### Create VPC

Navigate to **AWS Console > VPC > Your VPCs > Create VPC**.

**Settings:**

* Name: `valohai-vpc`
* IPv4 CIDR: `10.0.0.0/16`
* No IPv6 CIDR
* Tenancy: Default
* Tags: Add `Key=valohai, Value=1`

### Create Subnets

Create one subnet per availability zone in your region.

Navigate to **VPC > Subnets > Create subnet**.

**Example for a region with 4 zones:**

| Name               | Availability Zone | IPv4 CIDR      |
| ------------------ | ----------------- | -------------- |
| `valohai-subnet-1` | `<region>a`       | `10.0.0.0/20`  |
| `valohai-subnet-2` | `<region>b`       | `10.0.16.0/20` |
| `valohai-subnet-3` | `<region>c`       | `10.0.32.0/20` |
| `valohai-subnet-4` | `<region>d`       | `10.0.48.0/20` |

Add `Key=valohai, Value=1` tag to each subnet.

### Create Internet Gateway

Navigate to **VPC > Internet Gateways > Create internet gateway**.

**Settings:**

* Name: `valohai-igw`
* Tags: Add `Key=valohai, Value=1`

After creation, attach it to `valohai-vpc`:

* Actions > Attach to VPC > Select `valohai-vpc`

### Configure Route Table

Navigate to **VPC > Route Tables**.

Find the main route table for `valohai-vpc` and rename it to `valohai-rt`.

Edit routes to add:

| Destination   | Target        |
| ------------- | ------------- |
| `10.0.0.0/16` | local         |
| `0.0.0.0/0`   | `valohai-igw` |

Add `Key=valohai, Value=1` tag to the route table.

## Step 3: Create Security Groups

### valohai-sg-workers

Navigate to **EC2 > Security Groups > Create security group**.

**Basic details:**

* Name: `valohai-sg-workers`
* Description: Security group for Valohai worker instances
* VPC: `valohai-vpc`

**Inbound rules:**

* None by default (add SSH from your IP if needed for debugging)

**Outbound rules:**

* Type: All traffic
* Destination: `0.0.0.0/0`

**Tags:** Add `Key=valohai, Value=1`

### valohai-sg-queue

**Basic details:**

* Name: `valohai-sg-queue`
* Description: Security group for Valohai queue instance
* VPC: `valohai-vpc`

**Inbound rules:**

| Type       | Protocol | Port  | Source               | Description                        |
| ---------- | -------- | ----- | -------------------- | ---------------------------------- |
| Custom TCP | TCP      | 80    | `0.0.0.0/0`          | Let's Encrypt HTTP challenge       |
| Custom TCP | TCP      | 63790 | `34.248.245.191/32`  | Redis from app.valohai.com         |
| Custom TCP | TCP      | 63790 | `63.34.156.112/32`   | Redis from Valohai scaling service |
| Custom TCP | TCP      | 63790 | `valohai-sg-workers` | Redis from workers                 |

**Outbound rules:**

* Type: All traffic
* Destination: `0.0.0.0/0`

**Tags:** Add `Key=valohai, Value=1`

## Step 4: Create Secrets Manager Secret

Navigate to **AWS Console > Secrets Manager > Store a new secret**.

**Secret type:**

* Other type of secret

**Key/value:**

* Plaintext tab
* Generate a strong password with uppercase, lowercase letters, and numbers (no special characters)

**Secret name:** `valohai_redis_server`

**Tags:** Add `Key=valohai, Value=1`

**Rotation:** Disable automatic rotation

Save the secret and note the password for the next step.

## Step 5: Allocate Elastic IP

Navigate to **EC2 > Elastic IPs > Allocate Elastic IP address**.

**Settings:**

* Network Border Group: Default
* Public IPv4 address pool: Amazon's pool of IPv4 addresses
* Tags: `Name=valohai-ip-queue`, `Key=valohai, Value=1`

Allocate the IP. You'll associate it with the queue instance in the next step.

## Step 6: Create Queue Instance

Navigate to **EC2 > Instances > Launch instance**.

### Basic Configuration

**Name:** `valohai-queue`

**Application and OS Images:**

* Ubuntu Server 24.04 LTS
* Architecture: 64-bit (x86)

**Instance type:** `t3.medium`

**Key pair:** Select your existing key pair

### Network Settings

**VPC:** `valohai-vpc`

**Subnet:** Any subnet (e.g., `valohai-subnet-1`)

**Auto-assign public IP:** Disable (we'll use Elastic IP)

**Firewall (security groups):** Select `valohai-sg-queue`

### Storage

**Root volume:**

* Size: 32 GiB
* Volume type: gp3 (General Purpose SSD)
* Encrypted: Optional (recommended)

### Advanced Details

**IAM instance profile:** `ValohaiQueueRole`

**User data:** Paste this script, **replacing `<queue_address>` with your actual queue address** from Valohai:

```bash
#!/bin/bash
sudo apt-get update && sudo apt-get install awscli -y
export TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
export REGION=`curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/placement/region/`
export PASSWORD=`aws secretsmanager get-secret-value --secret-id valohai_redis_server --region $REGION | sed -n 's|.*"SecretString": *"\([^"]*\)".*|\1|p'`
export QUEUE=<queue_address>
curl https://raw.githubusercontent.com/valohai/worker-queue/main/host/setup.sh | sudo QUEUE_ADDRESS=$QUEUE REDIS_PASSWORD=$PASSWORD bash
unset PASSWORD
```

**Tags:** Add `Key=valohai, Value=1`

Launch the instance.

### Associate Elastic IP

After the instance is running:

1. Navigate to **EC2 > Elastic IPs**
2. Select your `valohai-ip-queue`
3. Actions > Associate Elastic IP address
4. Select the `valohai-queue` instance
5. Associate

## Step 7: Create S3 Bucket

Navigate to **S3 > Create bucket**.

**Bucket name:** `valohai-data-<AWS-ACCOUNT-ID>`

* Replace `<AWS-ACCOUNT-ID>` with your 12-digit account ID
* Example: `valohai-data-123456789012`

**Region:** Same as your EC2 instances

**Block Public Access:** Enable (all checkboxes)

**Bucket Versioning:** Disabled

**Tags:** Add `Key=valohai, Value=1`

**Encryption:** Enable (AWS managed keys)

Create the bucket.

### Configure CORS

After creating the bucket:

1. Open the bucket
2. Navigate to **Permissions** tab
3. Scroll to **Cross-origin resource sharing (CORS)**
4. Click Edit and paste:

```json
[
  {
    "AllowedHeaders": [
      "Authorization"
    ],
    "AllowedMethods": [
      "GET"
    ],
    "AllowedOrigins": [
      "*"
    ],
    "ExposeHeaders": [],
    "MaxAgeSeconds": 3000
  },
  {
    "AllowedHeaders": [
      "Authorization"
    ],
    "AllowedMethods": [
      "POST"
    ],
    "AllowedOrigins": [
      "https://app.valohai.com"
    ],
    "ExposeHeaders": [],
    "MaxAgeSeconds": 3000
  }
]
```

Save changes.

## Step 8: Verify and Share Information

### Collect Required Information

Gather these values to send to Valohai:

**Subscription and Location:**

* AWS Account ID: `____________`
* Region: `____________`

**IAM:**

* ValohaiMaster Role ARN: `arn:aws:iam::<account-id>:role/ValohaiMaster`

**Networking:**

* VPC ID: `vpc-____________`
* Subnet IDs: `subnet-________, subnet-________, ...`

**Queue Instance:**

* Private IP: `____________`
* Public IP (Elastic IP): `____________`

**Storage:**

* S3 Bucket Name: `valohai-data-<account-id>`

### Send to Valohai

Email this information to your Valohai contact at **<support@valohai.com>** using your organization's secure communication method (e.g., password-protected document, encrypted email).

### Verify Queue Instance

SSH into the queue instance to verify it's running correctly:

```shell
ssh -i your-key.pem ubuntu@<elastic-ip>
sudo systemctl status valohai-queue
```

You should see the service is active and running. Check logs with:

```shell
sudo journalctl -u valohai-queue -f
```

## Troubleshooting

### Queue instance not accessible

**Check security group rules:**

```shell
aws ec2 describe-security-groups --group-ids <sg-id>
```

Verify port 63790 is open from `34.248.245.191/32` and `63.34.156.112/32`.

**Check instance status:**

```shell
aws ec2 describe-instance-status --instance-ids <instance-id>
```

**Verify user data script ran:**

```shell
ssh -i your-key.pem ubuntu@<elastic-ip>
cat /var/log/cloud-init-output.log
```

### IAM role issues

**Verify trust relationship:**

```shell
aws iam get-role --role-name ValohaiMaster
```

Check the `AssumeRolePolicyDocument` contains the correct Valohai ARN.

**Check instance profile:**

```shell
aws iam get-instance-profile --instance-profile-name ValohaiWorkerRole
```

### S3 access errors

**Verify bucket policy and CORS:**

```shell
aws s3api get-bucket-cors --bucket valohai-data-<account-id>
```

**Test worker access:**

```shell
# From a worker instance
aws s3 ls s3://valohai-data-<account-id>/
```

### Cannot reach queue from workers

**Check network connectivity:**

```shell
# From a worker instance
telnet <queue-private-ip> 63790
```

**Verify security group:**

* Ensure `valohai-sg-queue` allows inbound from `valohai-sg-workers`

## Getting Help

**Before contacting support**, collect:

* CloudTrail logs for any permission errors
* Security group configurations
* IAM role trust relationships
* Queue instance logs (`/var/log/cloud-init-output.log`)

**Contact:** <support@valohai.com>

**Include in your message:**

* AWS Account ID
* Region
* Specific error messages
* Steps you've already tried


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/installation-and-setup/aws/hybrid-manual.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
