Hybrid Deployment - Manual Setup

Step-by-step guide to manually deploy Valohai workers to AWS using the Console or CLI

This guide walks through manually creating all AWS resources for a Valohai hybrid deployment. Use this if you can't use CloudFormation or Terraform, or need specific customizations.

Prefer automation? Use Terraform or CloudFormation for faster, repeatable deployments.

Prerequisites

From Valohai:

  • valohai_assume_user - ARN of the Valohai user (e.g., arn:aws:iam::635691382966:user/valohai-customer-yourcompany)

  • queue_address - DNS name for your queue (e.g., something.vqueue.net)

From your AWS account:

  • Admin access to AWS Console or CLI

  • Region selected (consider GPU availability)

  • EC2 key pair for SSH access

Contact [email protected] to receive your credentials before proceeding.

Step 1: Configure IAM Roles

Create four IAM policies and roles that Valohai needs to manage resources.

Create IAM Policies

Navigate to AWS Console > IAM > Policies and create these four policies.

Important: Replace placeholders before saving:

  • <AWS-ACCOUNT-ID> with your 12-digit AWS account ID

  • valohai-data-<AWS-ACCOUNT-ID> with your account ID (e.g., valohai-data-123456789012)

ValohaiQueuePolicy

Allows the queue instance to read secrets from Secrets Manager.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "0",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetResourcePolicy",
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret",
        "secretsmanager:ListSecretVersionIds"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "secretsmanager:ResourceTag/valohai": "1"
        }
      }
    },
    {
      "Sid": "1",
      "Effect": "Allow",
      "Action": "secretsmanager:GetRandomPassword",
      "Resource": "*"
    }
  ]
}

ValohaiWorkerPolicy

Allows worker instances to describe themselves and protect from scaledown.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "autoscaling:SetInstanceProtection",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "1"
    },
    {
      "Action": "ec2:DescribeInstances",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "2"
    }
  ]
}

ValohaiS3MultipartPolicy

Allows uploading large files (>5GB) to S3.

Replace <AWS-ACCOUNT-ID> in both Resource ARNs.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MultipartAccess",
      "Effect": "Allow",
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions",
        "s3:ListMultipartUploadParts",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>",
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>/*"
      ]
    }
  ]
}

ValohaiMasterPolicy

Allows Valohai to manage EC2 resources and access storage.

Replace both <AWS-ACCOUNT-ID> placeholders.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "2",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeVpcs",
        "ec2:DescribeKeyPairs",
        "ec2:DescribeImages",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:DescribeInstanceAttribute",
        "ec2:CreateTags",
        "ec2:DescribeInternetGateways",
        "ec2:DescribeRouteTables",
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeScalingActivities"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AllowUpdatingSpotLaunchTemplates",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateLaunchTemplate",
        "ec2:CreateLaunchTemplateVersion",
        "ec2:ModifyLaunchTemplate",
        "ec2:RunInstances",
        "ec2:TerminateInstances",
        "ec2:RebootInstances",
        "autoscaling:UpdateAutoScalingGroup",
        "autoscaling:CreateOrUpdateTags",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:CreateAutoScalingGroup"
      ],
      "Resource": "*",
      "Condition": {
        "ForAllValues:StringEquals": {
          "aws:ResourceTag/valohai": "1"
        }
      }
    },
    {
      "Sid": "ServiceLinkedRole",
      "Effect": "Allow",
      "Action": "iam:CreateServiceLinkedRole",
      "Resource": "arn:aws:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
    },
    {
      "Sid": "4",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole",
        "iam:GetRole"
      ],
      "Resource": "arn:aws:iam::<AWS-ACCOUNT-ID>:role/ValohaiWorkerRole"
    },
    {
      "Sid": "0",
      "Effect": "Allow",
      "Condition": {
        "StringEquals": {
          "secretsmanager:ResourceTag/valohai": "1"
        }
      },
      "Action": [
        "secretsmanager:GetResourcePolicy",
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret",
        "secretsmanager:ListSecretVersionIds"
      ],
      "Resource": "*"
    },
    {
      "Action": "secretsmanager:GetRandomPassword",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "1"
    },
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>",
        "arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>/*"
      ]
    }
  ]
}

Create IAM Roles

Navigate to AWS Console > IAM > Roles and create these four roles.

Role Name
Use Case
Attach Policy
Additional Configuration

ValohaiQueueRole

EC2

ValohaiQueuePolicy

Instance profile auto-created

ValohaiWorkerRole

EC2

ValohaiWorkerPolicy

Instance profile auto-created

ValohaiS3MultipartRole

Another AWS Account

ValohaiS3MultipartPolicy

Account ID: <YOUR-AWS-ACCOUNT-ID>

ValohaiMaster

Another AWS Account

ValohaiMasterPolicy

Account ID: 635691382966

For ValohaiMaster role:

After creation, verify the trust relationship. It should look like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::635691382966:user/valohai-customer-<YOUR-IDENTIFIER>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The ARN here matches the valohai_assume_user provided by Valohai.

Important: ValohaiQueueRole and ValohaiWorkerRole need instance profiles. These are automatically created when using the Console. If using CLI, create them with:

aws iam create-instance-profile --instance-profile-name ValohaiQueueRole
aws iam add-role-to-instance-profile --instance-profile-name ValohaiQueueRole --role-name ValohaiQueueRole

aws iam create-instance-profile --instance-profile-name ValohaiWorkerRole
aws iam add-role-to-instance-profile --instance-profile-name ValohaiWorkerRole --role-name ValohaiWorkerRole

Step 2: Create VPC and Networking

Create VPC

Navigate to AWS Console > VPC > Your VPCs > Create VPC.

Settings:

  • Name: valohai-vpc

  • IPv4 CIDR: 10.0.0.0/16

  • No IPv6 CIDR

  • Tenancy: Default

  • Tags: Add Key=valohai, Value=1

Create Subnets

Create one subnet per availability zone in your region.

Navigate to VPC > Subnets > Create subnet.

Example for a region with 4 zones:

Name
Availability Zone
IPv4 CIDR

valohai-subnet-1

<region>a

10.0.0.0/20

valohai-subnet-2

<region>b

10.0.16.0/20

valohai-subnet-3

<region>c

10.0.32.0/20

valohai-subnet-4

<region>d

10.0.48.0/20

Add Key=valohai, Value=1 tag to each subnet.

Create Internet Gateway

Navigate to VPC > Internet Gateways > Create internet gateway.

Settings:

  • Name: valohai-igw

  • Tags: Add Key=valohai, Value=1

After creation, attach it to valohai-vpc:

  • Actions > Attach to VPC > Select valohai-vpc

Configure Route Table

Navigate to VPC > Route Tables.

Find the main route table for valohai-vpc and rename it to valohai-rt.

Edit routes to add:

Destination
Target

10.0.0.0/16

local

0.0.0.0/0

valohai-igw

Add Key=valohai, Value=1 tag to the route table.

Step 3: Create Security Groups

valohai-sg-workers

Navigate to EC2 > Security Groups > Create security group.

Basic details:

  • Name: valohai-sg-workers

  • Description: Security group for Valohai worker instances

  • VPC: valohai-vpc

Inbound rules:

  • None by default (add SSH from your IP if needed for debugging)

Outbound rules:

  • Type: All traffic

  • Destination: 0.0.0.0/0

Tags: Add Key=valohai, Value=1

valohai-sg-queue

Basic details:

  • Name: valohai-sg-queue

  • Description: Security group for Valohai queue instance

  • VPC: valohai-vpc

Inbound rules:

Type
Protocol
Port
Source
Description

Custom TCP

TCP

80

0.0.0.0/0

Let's Encrypt HTTP challenge

Custom TCP

TCP

63790

34.248.245.191/32

Redis from app.valohai.com

Custom TCP

TCP

63790

63.34.156.112/32

Redis from Valohai scaling service

Custom TCP

TCP

63790

valohai-sg-workers

Redis from workers

Outbound rules:

  • Type: All traffic

  • Destination: 0.0.0.0/0

Tags: Add Key=valohai, Value=1

Step 4: Create Secrets Manager Secret

Navigate to AWS Console > Secrets Manager > Store a new secret.

Secret type:

  • Other type of secret

Key/value:

  • Plaintext tab

  • Generate a strong password with uppercase, lowercase letters, and numbers (no special characters)

Secret name: valohai_redis_server

Tags: Add Key=valohai, Value=1

Rotation: Disable automatic rotation

Save the secret and note the password for the next step.

Step 5: Allocate Elastic IP

Navigate to EC2 > Elastic IPs > Allocate Elastic IP address.

Settings:

  • Network Border Group: Default

  • Public IPv4 address pool: Amazon's pool of IPv4 addresses

  • Tags: Name=valohai-ip-queue, Key=valohai, Value=1

Allocate the IP. You'll associate it with the queue instance in the next step.

Step 6: Create Queue Instance

Navigate to EC2 > Instances > Launch instance.

Basic Configuration

Name: valohai-queue

Application and OS Images:

  • Ubuntu Server 24.04 LTS

  • Architecture: 64-bit (x86)

Instance type: t3.medium

Key pair: Select your existing key pair

Network Settings

VPC: valohai-vpc

Subnet: Any subnet (e.g., valohai-subnet-1)

Auto-assign public IP: Disable (we'll use Elastic IP)

Firewall (security groups): Select valohai-sg-queue

Storage

Root volume:

  • Size: 32 GiB

  • Volume type: gp3 (General Purpose SSD)

  • Encrypted: Optional (recommended)

Advanced Details

IAM instance profile: ValohaiQueueRole

User data: Paste this script, replacing <queue_address> with your actual queue address from Valohai:

#!/bin/bash
sudo apt-get update && sudo apt-get install awscli -y
export TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
export REGION=`curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/placement/region/`
export PASSWORD=`aws secretsmanager get-secret-value --secret-id valohai_redis_server --region $REGION | sed -n 's|.*"SecretString": *"\([^"]*\)".*|\1|p'`
export QUEUE=<queue_address>
curl https://raw.githubusercontent.com/valohai/worker-queue/main/host/setup.sh | sudo QUEUE_ADDRESS=$QUEUE REDIS_PASSWORD=$PASSWORD bash
unset PASSWORD

Tags: Add Key=valohai, Value=1

Launch the instance.

Associate Elastic IP

After the instance is running:

  1. Navigate to EC2 > Elastic IPs

  2. Select your valohai-ip-queue

  3. Actions > Associate Elastic IP address

  4. Select the valohai-queue instance

  5. Associate

Step 7: Create S3 Bucket

Navigate to S3 > Create bucket.

Bucket name: valohai-data-<AWS-ACCOUNT-ID>

  • Replace <AWS-ACCOUNT-ID> with your 12-digit account ID

  • Example: valohai-data-123456789012

Region: Same as your EC2 instances

Block Public Access: Enable (all checkboxes)

Bucket Versioning: Disabled

Tags: Add Key=valohai, Value=1

Encryption: Enable (AWS managed keys)

Create the bucket.

Configure CORS

After creating the bucket:

  1. Open the bucket

  2. Navigate to Permissions tab

  3. Scroll to Cross-origin resource sharing (CORS)

  4. Click Edit and paste:

[
  {
    "AllowedHeaders": ["Authorization"],
    "AllowedMethods": ["GET"],
    "AllowedOrigins": ["*"],
    "ExposeHeaders": [],
    "MaxAgeSeconds": 3000
  },
  {
    "AllowedHeaders": ["Authorization"],
    "AllowedMethods": ["POST"],
    "AllowedOrigins": ["https://app.valohai.com"],
    "ExposeHeaders": [],
    "MaxAgeSeconds": 3000
  }
]

Save changes.

Step 8: Verify and Share Information

Collect Required Information

Gather these values to send to Valohai:

Subscription and Location:

  • AWS Account ID: ____________

  • Region: ____________

IAM:

  • ValohaiMaster Role ARN: arn:aws:iam::<account-id>:role/ValohaiMaster

Networking:

  • VPC ID: vpc-____________

  • Subnet IDs: subnet-________, subnet-________, ...

Queue Instance:

  • Private IP: ____________

  • Public IP (Elastic IP): ____________

Storage:

  • S3 Bucket Name: valohai-data-<account-id>

Send to Valohai

Email this information to your Valohai contact at [email protected] using your organization's secure communication method (e.g., password-protected document, encrypted email).

Verify Queue Instance

SSH into the queue instance to verify it's running correctly:

ssh -i your-key.pem ubuntu@<elastic-ip>
sudo systemctl status valohai-queue

You should see the service is active and running. Check logs with:

sudo journalctl -u valohai-queue -f

Troubleshooting

Queue instance not accessible

Check security group rules:

aws ec2 describe-security-groups --group-ids <sg-id>

Verify port 63790 is open from 34.248.245.191/32 and 63.34.156.112/32.

Check instance status:

aws ec2 describe-instance-status --instance-ids <instance-id>

Verify user data script ran:

ssh -i your-key.pem ubuntu@<elastic-ip>
cat /var/log/cloud-init-output.log

IAM role issues

Verify trust relationship:

aws iam get-role --role-name ValohaiMaster

Check the AssumeRolePolicyDocument contains the correct Valohai ARN.

Check instance profile:

aws iam get-instance-profile --instance-profile-name ValohaiWorkerRole

S3 access errors

Verify bucket policy and CORS:

aws s3api get-bucket-cors --bucket valohai-data-<account-id>

Test worker access:

# From a worker instance
aws s3 ls s3://valohai-data-<account-id>/

Cannot reach queue from workers

Check network connectivity:

# From a worker instance
telnet <queue-private-ip> 63790

Verify security group:

  • Ensure valohai-sg-queue allows inbound from valohai-sg-workers

Getting Help

Before contacting support, collect:

  • CloudTrail logs for any permission errors

  • Security group configurations

  • IAM role trust relationships

  • Queue instance logs (/var/log/cloud-init-output.log)

Contact: [email protected]

Include in your message:

  • AWS Account ID

  • Region

  • Specific error messages

  • Steps you've already tried

Last updated

Was this helpful?