Hybrid Deployment - Manual Setup
Step-by-step guide to manually deploy Valohai workers to AWS using the Console or CLI
This guide walks through manually creating all AWS resources for a Valohai hybrid deployment. Use this if you can't use CloudFormation or Terraform, or need specific customizations.
Prefer automation? Use Terraform or CloudFormation for faster, repeatable deployments.
Prerequisites
From Valohai:
valohai_assume_user- ARN of the Valohai user (e.g.,arn:aws:iam::635691382966:user/valohai-customer-yourcompany)queue_address- DNS name for your queue (e.g.,something.vqueue.net)
From your AWS account:
Admin access to AWS Console or CLI
Region selected (consider GPU availability)
EC2 key pair for SSH access
Contact [email protected] to receive your credentials before proceeding.
Step 1: Configure IAM Roles
Create four IAM policies and roles that Valohai needs to manage resources.
Create IAM Policies
Navigate to AWS Console > IAM > Policies and create these four policies.
Important: Replace placeholders before saving:
<AWS-ACCOUNT-ID>with your 12-digit AWS account IDvalohai-data-<AWS-ACCOUNT-ID>with your account ID (e.g.,valohai-data-123456789012)
ValohaiQueuePolicy
Allows the queue instance to read secrets from Secrets Manager.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "0",
"Effect": "Allow",
"Action": [
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecretVersionIds"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"secretsmanager:ResourceTag/valohai": "1"
}
}
},
{
"Sid": "1",
"Effect": "Allow",
"Action": "secretsmanager:GetRandomPassword",
"Resource": "*"
}
]
}ValohaiWorkerPolicy
Allows worker instances to describe themselves and protect from scaledown.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "autoscaling:SetInstanceProtection",
"Resource": "*",
"Effect": "Allow",
"Sid": "1"
},
{
"Action": "ec2:DescribeInstances",
"Resource": "*",
"Effect": "Allow",
"Sid": "2"
}
]
}ValohaiS3MultipartPolicy
Allows uploading large files (>5GB) to S3.
Replace <AWS-ACCOUNT-ID> in both Resource ARNs.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "MultipartAccess",
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions",
"s3:ListMultipartUploadParts",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>",
"arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>/*"
]
}
]
}ValohaiMasterPolicy
Allows Valohai to manage EC2 resources and access storage.
Replace both <AWS-ACCOUNT-ID> placeholders.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "2",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeVpcs",
"ec2:DescribeKeyPairs",
"ec2:DescribeImages",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeInstanceAttribute",
"ec2:CreateTags",
"ec2:DescribeInternetGateways",
"ec2:DescribeRouteTables",
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeScalingActivities"
],
"Resource": "*"
},
{
"Sid": "AllowUpdatingSpotLaunchTemplates",
"Effect": "Allow",
"Action": [
"ec2:CreateLaunchTemplate",
"ec2:CreateLaunchTemplateVersion",
"ec2:ModifyLaunchTemplate",
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:RebootInstances",
"autoscaling:UpdateAutoScalingGroup",
"autoscaling:CreateOrUpdateTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:CreateAutoScalingGroup"
],
"Resource": "*",
"Condition": {
"ForAllValues:StringEquals": {
"aws:ResourceTag/valohai": "1"
}
}
},
{
"Sid": "ServiceLinkedRole",
"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "arn:aws:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
},
{
"Sid": "4",
"Effect": "Allow",
"Action": [
"iam:PassRole",
"iam:GetRole"
],
"Resource": "arn:aws:iam::<AWS-ACCOUNT-ID>:role/ValohaiWorkerRole"
},
{
"Sid": "0",
"Effect": "Allow",
"Condition": {
"StringEquals": {
"secretsmanager:ResourceTag/valohai": "1"
}
},
"Action": [
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecretVersionIds"
],
"Resource": "*"
},
{
"Action": "secretsmanager:GetRandomPassword",
"Resource": "*",
"Effect": "Allow",
"Sid": "1"
},
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>",
"arn:aws:s3:::valohai-data-<AWS-ACCOUNT-ID>/*"
]
}
]
}Create IAM Roles
Navigate to AWS Console > IAM > Roles and create these four roles.
ValohaiQueueRole
EC2
ValohaiQueuePolicy
Instance profile auto-created
ValohaiWorkerRole
EC2
ValohaiWorkerPolicy
Instance profile auto-created
ValohaiS3MultipartRole
Another AWS Account
ValohaiS3MultipartPolicy
Account ID: <YOUR-AWS-ACCOUNT-ID>
ValohaiMaster
Another AWS Account
ValohaiMasterPolicy
Account ID: 635691382966
For ValohaiMaster role:
After creation, verify the trust relationship. It should look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::635691382966:user/valohai-customer-<YOUR-IDENTIFIER>"
},
"Action": "sts:AssumeRole"
}
]
}The ARN here matches the valohai_assume_user provided by Valohai.
Important: ValohaiQueueRole and ValohaiWorkerRole need instance profiles. These are automatically created when using the Console. If using CLI, create them with:
aws iam create-instance-profile --instance-profile-name ValohaiQueueRole
aws iam add-role-to-instance-profile --instance-profile-name ValohaiQueueRole --role-name ValohaiQueueRole
aws iam create-instance-profile --instance-profile-name ValohaiWorkerRole
aws iam add-role-to-instance-profile --instance-profile-name ValohaiWorkerRole --role-name ValohaiWorkerRoleStep 2: Create VPC and Networking
Create VPC
Navigate to AWS Console > VPC > Your VPCs > Create VPC.
Settings:
Name:
valohai-vpcIPv4 CIDR:
10.0.0.0/16No IPv6 CIDR
Tenancy: Default
Tags: Add
Key=valohai, Value=1
Create Subnets
Create one subnet per availability zone in your region.
Navigate to VPC > Subnets > Create subnet.
Example for a region with 4 zones:
valohai-subnet-1
<region>a
10.0.0.0/20
valohai-subnet-2
<region>b
10.0.16.0/20
valohai-subnet-3
<region>c
10.0.32.0/20
valohai-subnet-4
<region>d
10.0.48.0/20
Add Key=valohai, Value=1 tag to each subnet.
Create Internet Gateway
Navigate to VPC > Internet Gateways > Create internet gateway.
Settings:
Name:
valohai-igwTags: Add
Key=valohai, Value=1
After creation, attach it to valohai-vpc:
Actions > Attach to VPC > Select
valohai-vpc
Configure Route Table
Navigate to VPC > Route Tables.
Find the main route table for valohai-vpc and rename it to valohai-rt.
Edit routes to add:
10.0.0.0/16
local
0.0.0.0/0
valohai-igw
Add Key=valohai, Value=1 tag to the route table.
Step 3: Create Security Groups
valohai-sg-workers
Navigate to EC2 > Security Groups > Create security group.
Basic details:
Name:
valohai-sg-workersDescription: Security group for Valohai worker instances
VPC:
valohai-vpc
Inbound rules:
None by default (add SSH from your IP if needed for debugging)
Outbound rules:
Type: All traffic
Destination:
0.0.0.0/0
Tags: Add Key=valohai, Value=1
valohai-sg-queue
Basic details:
Name:
valohai-sg-queueDescription: Security group for Valohai queue instance
VPC:
valohai-vpc
Inbound rules:
Custom TCP
TCP
80
0.0.0.0/0
Let's Encrypt HTTP challenge
Custom TCP
TCP
63790
34.248.245.191/32
Redis from app.valohai.com
Custom TCP
TCP
63790
63.34.156.112/32
Redis from Valohai scaling service
Custom TCP
TCP
63790
valohai-sg-workers
Redis from workers
Outbound rules:
Type: All traffic
Destination:
0.0.0.0/0
Tags: Add Key=valohai, Value=1
Step 4: Create Secrets Manager Secret
Navigate to AWS Console > Secrets Manager > Store a new secret.
Secret type:
Other type of secret
Key/value:
Plaintext tab
Generate a strong password with uppercase, lowercase letters, and numbers (no special characters)
Secret name: valohai_redis_server
Tags: Add Key=valohai, Value=1
Rotation: Disable automatic rotation
Save the secret and note the password for the next step.
Step 5: Allocate Elastic IP
Navigate to EC2 > Elastic IPs > Allocate Elastic IP address.
Settings:
Network Border Group: Default
Public IPv4 address pool: Amazon's pool of IPv4 addresses
Tags:
Name=valohai-ip-queue,Key=valohai, Value=1
Allocate the IP. You'll associate it with the queue instance in the next step.
Step 6: Create Queue Instance
Navigate to EC2 > Instances > Launch instance.
Basic Configuration
Name: valohai-queue
Application and OS Images:
Ubuntu Server 24.04 LTS
Architecture: 64-bit (x86)
Instance type: t3.medium
Key pair: Select your existing key pair
Network Settings
VPC: valohai-vpc
Subnet: Any subnet (e.g., valohai-subnet-1)
Auto-assign public IP: Disable (we'll use Elastic IP)
Firewall (security groups): Select valohai-sg-queue
Storage
Root volume:
Size: 32 GiB
Volume type: gp3 (General Purpose SSD)
Encrypted: Optional (recommended)
Advanced Details
IAM instance profile: ValohaiQueueRole
User data: Paste this script, replacing <queue_address> with your actual queue address from Valohai:
#!/bin/bash
sudo apt-get update && sudo apt-get install awscli -y
export TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
export REGION=`curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/placement/region/`
export PASSWORD=`aws secretsmanager get-secret-value --secret-id valohai_redis_server --region $REGION | sed -n 's|.*"SecretString": *"\([^"]*\)".*|\1|p'`
export QUEUE=<queue_address>
curl https://raw.githubusercontent.com/valohai/worker-queue/main/host/setup.sh | sudo QUEUE_ADDRESS=$QUEUE REDIS_PASSWORD=$PASSWORD bash
unset PASSWORDTags: Add Key=valohai, Value=1
Launch the instance.
Associate Elastic IP
After the instance is running:
Navigate to EC2 > Elastic IPs
Select your
valohai-ip-queueActions > Associate Elastic IP address
Select the
valohai-queueinstanceAssociate
Step 7: Create S3 Bucket
Navigate to S3 > Create bucket.
Bucket name: valohai-data-<AWS-ACCOUNT-ID>
Replace
<AWS-ACCOUNT-ID>with your 12-digit account IDExample:
valohai-data-123456789012
Region: Same as your EC2 instances
Block Public Access: Enable (all checkboxes)
Bucket Versioning: Disabled
Tags: Add Key=valohai, Value=1
Encryption: Enable (AWS managed keys)
Create the bucket.
Configure CORS
After creating the bucket:
Open the bucket
Navigate to Permissions tab
Scroll to Cross-origin resource sharing (CORS)
Click Edit and paste:
[
{
"AllowedHeaders": ["Authorization"],
"AllowedMethods": ["GET"],
"AllowedOrigins": ["*"],
"ExposeHeaders": [],
"MaxAgeSeconds": 3000
},
{
"AllowedHeaders": ["Authorization"],
"AllowedMethods": ["POST"],
"AllowedOrigins": ["https://app.valohai.com"],
"ExposeHeaders": [],
"MaxAgeSeconds": 3000
}
]Save changes.
Step 8: Verify and Share Information
Collect Required Information
Gather these values to send to Valohai:
Subscription and Location:
AWS Account ID:
____________Region:
____________
IAM:
ValohaiMaster Role ARN:
arn:aws:iam::<account-id>:role/ValohaiMaster
Networking:
VPC ID:
vpc-____________Subnet IDs:
subnet-________, subnet-________, ...
Queue Instance:
Private IP:
____________Public IP (Elastic IP):
____________
Storage:
S3 Bucket Name:
valohai-data-<account-id>
Send to Valohai
Email this information to your Valohai contact at [email protected] using your organization's secure communication method (e.g., password-protected document, encrypted email).
Verify Queue Instance
SSH into the queue instance to verify it's running correctly:
ssh -i your-key.pem ubuntu@<elastic-ip>
sudo systemctl status valohai-queueYou should see the service is active and running. Check logs with:
sudo journalctl -u valohai-queue -fTroubleshooting
Queue instance not accessible
Check security group rules:
aws ec2 describe-security-groups --group-ids <sg-id>Verify port 63790 is open from 34.248.245.191/32 and 63.34.156.112/32.
Check instance status:
aws ec2 describe-instance-status --instance-ids <instance-id>Verify user data script ran:
ssh -i your-key.pem ubuntu@<elastic-ip>
cat /var/log/cloud-init-output.logIAM role issues
Verify trust relationship:
aws iam get-role --role-name ValohaiMasterCheck the AssumeRolePolicyDocument contains the correct Valohai ARN.
Check instance profile:
aws iam get-instance-profile --instance-profile-name ValohaiWorkerRoleS3 access errors
Verify bucket policy and CORS:
aws s3api get-bucket-cors --bucket valohai-data-<account-id>Test worker access:
# From a worker instance
aws s3 ls s3://valohai-data-<account-id>/Cannot reach queue from workers
Check network connectivity:
# From a worker instance
telnet <queue-private-ip> 63790Verify security group:
Ensure
valohai-sg-queueallows inbound fromvalohai-sg-workers
Getting Help
Before contacting support, collect:
CloudTrail logs for any permission errors
Security group configurations
IAM role trust relationships
Queue instance logs (
/var/log/cloud-init-output.log)
Contact: [email protected]
Include in your message:
AWS Account ID
Region
Specific error messages
Steps you've already tried
Last updated
Was this helpful?
