Hybrid Deployment - Manual Setup

This guide walks through manually creating all AWS resources for a Valohai hybrid deployment. Use this if you can't use CloudFormation or Terraform, or need specific customizations.

Prefer automation? Use Terraform or CloudFormation for faster, repeatable deployments.

Prerequisites

From Valohai:

  • valohai_assume_user - ARN of the Valohai user (e.g., arn:aws:iam::635691382966:user/valohai-customer-yourcompany)

  • queue_address - DNS name for your queue (e.g., something.vqueue.net)

From your AWS account:

  • Admin access to AWS Console or CLI

  • Region selected (consider GPU availability)

  • EC2 key pair for SSH access

Contact [email protected] to receive your credentials before proceeding.

Step 1: Configure IAM Roles

Create four IAM policies and roles that Valohai needs to manage resources.

Create IAM Policies

Navigate to AWS Console > IAM > Policies and create these four policies.

Important: Replace placeholders before saving:

  • <AWS-ACCOUNT-ID> with your 12-digit AWS account ID

  • valohai-data-<AWS-ACCOUNT-ID> with your account ID (e.g., valohai-data-123456789012)

ValohaiQueuePolicy

Allows the queue instance to read secrets from Secrets Manager.

ValohaiWorkerPolicy

Allows worker instances to describe themselves and protect from scaledown.

ValohaiS3MultipartPolicy

Allows uploading large files (>5GB) to S3.

Replace <AWS-ACCOUNT-ID> in both Resource ARNs.

ValohaiMasterPolicy

Allows Valohai to manage EC2 resources and access storage.

Replace both <AWS-ACCOUNT-ID> placeholders.

Create IAM Roles

Navigate to AWS Console > IAM > Roles and create these four roles.

Role Name
Use Case
Attach Policy
Additional Configuration

ValohaiQueueRole

EC2

ValohaiQueuePolicy

Instance profile auto-created

ValohaiWorkerRole

EC2

ValohaiWorkerPolicy

Instance profile auto-created

ValohaiS3MultipartRole

Another AWS Account

ValohaiS3MultipartPolicy

Account ID: <YOUR-AWS-ACCOUNT-ID>

ValohaiMaster

Another AWS Account

ValohaiMasterPolicy

Account ID: 635691382966

For ValohaiMaster role:

After creation, verify the trust relationship. It should look like this:

The ARN here matches the valohai_assume_user provided by Valohai.

Important: ValohaiQueueRole and ValohaiWorkerRole need instance profiles. These are automatically created when using the Console. If using CLI, create them with:

Step 2: Create VPC and Networking

Create VPC

Navigate to AWS Console > VPC > Your VPCs > Create VPC.

Settings:

  • Name: valohai-vpc

  • IPv4 CIDR: 10.0.0.0/16

  • No IPv6 CIDR

  • Tenancy: Default

  • Tags: Add Key=valohai, Value=1

Create Subnets

Create one subnet per availability zone in your region.

Navigate to VPC > Subnets > Create subnet.

Example for a region with 4 zones:

Name
Availability Zone
IPv4 CIDR

valohai-subnet-1

<region>a

10.0.0.0/20

valohai-subnet-2

<region>b

10.0.16.0/20

valohai-subnet-3

<region>c

10.0.32.0/20

valohai-subnet-4

<region>d

10.0.48.0/20

Add Key=valohai, Value=1 tag to each subnet.

Create Internet Gateway

Navigate to VPC > Internet Gateways > Create internet gateway.

Settings:

  • Name: valohai-igw

  • Tags: Add Key=valohai, Value=1

After creation, attach it to valohai-vpc:

  • Actions > Attach to VPC > Select valohai-vpc

Configure Route Table

Navigate to VPC > Route Tables.

Find the main route table for valohai-vpc and rename it to valohai-rt.

Edit routes to add:

Destination
Target

10.0.0.0/16

local

0.0.0.0/0

valohai-igw

Add Key=valohai, Value=1 tag to the route table.

Step 3: Create Security Groups

valohai-sg-workers

Navigate to EC2 > Security Groups > Create security group.

Basic details:

  • Name: valohai-sg-workers

  • Description: Security group for Valohai worker instances

  • VPC: valohai-vpc

Inbound rules:

  • None by default (add SSH from your IP if needed for debugging)

Outbound rules:

  • Type: All traffic

  • Destination: 0.0.0.0/0

Tags: Add Key=valohai, Value=1

valohai-sg-queue

Basic details:

  • Name: valohai-sg-queue

  • Description: Security group for Valohai queue instance

  • VPC: valohai-vpc

Inbound rules:

Type
Protocol
Port
Source
Description

Custom TCP

TCP

80

0.0.0.0/0

Let's Encrypt HTTP challenge

Custom TCP

TCP

63790

34.248.245.191/32

Redis from app.valohai.com

Custom TCP

TCP

63790

63.34.156.112/32

Redis from Valohai scaling service

Custom TCP

TCP

63790

valohai-sg-workers

Redis from workers

Outbound rules:

  • Type: All traffic

  • Destination: 0.0.0.0/0

Tags: Add Key=valohai, Value=1

Step 4: Create Secrets Manager Secret

Navigate to AWS Console > Secrets Manager > Store a new secret.

Secret type:

  • Other type of secret

Key/value:

  • Plaintext tab

  • Generate a strong password with uppercase, lowercase letters, and numbers (no special characters)

Secret name: valohai_redis_server

Tags: Add Key=valohai, Value=1

Rotation: Disable automatic rotation

Save the secret and note the password for the next step.

Step 5: Allocate Elastic IP

Navigate to EC2 > Elastic IPs > Allocate Elastic IP address.

Settings:

  • Network Border Group: Default

  • Public IPv4 address pool: Amazon's pool of IPv4 addresses

  • Tags: Name=valohai-ip-queue, Key=valohai, Value=1

Allocate the IP. You'll associate it with the queue instance in the next step.

Step 6: Create Queue Instance

Navigate to EC2 > Instances > Launch instance.

Basic Configuration

Name: valohai-queue

Application and OS Images:

  • Ubuntu Server 24.04 LTS

  • Architecture: 64-bit (x86)

Instance type: t3.medium

Key pair: Select your existing key pair

Network Settings

VPC: valohai-vpc

Subnet: Any subnet (e.g., valohai-subnet-1)

Auto-assign public IP: Disable (we'll use Elastic IP)

Firewall (security groups): Select valohai-sg-queue

Storage

Root volume:

  • Size: 32 GiB

  • Volume type: gp3 (General Purpose SSD)

  • Encrypted: Optional (recommended)

Advanced Details

IAM instance profile: ValohaiQueueRole

User data: Paste this script, replacing <queue_address> with your actual queue address from Valohai:

Tags: Add Key=valohai, Value=1

Launch the instance.

Associate Elastic IP

After the instance is running:

  1. Navigate to EC2 > Elastic IPs

  2. Select your valohai-ip-queue

  3. Actions > Associate Elastic IP address

  4. Select the valohai-queue instance

  5. Associate

Step 7: Create S3 Bucket

Navigate to S3 > Create bucket.

Bucket name: valohai-data-<AWS-ACCOUNT-ID>

  • Replace <AWS-ACCOUNT-ID> with your 12-digit account ID

  • Example: valohai-data-123456789012

Region: Same as your EC2 instances

Block Public Access: Enable (all checkboxes)

Bucket Versioning: Disabled

Tags: Add Key=valohai, Value=1

Encryption: Enable (AWS managed keys)

Create the bucket.

Configure CORS

After creating the bucket:

  1. Open the bucket

  2. Navigate to Permissions tab

  3. Scroll to Cross-origin resource sharing (CORS)

  4. Click Edit and paste:

Save changes.

Step 8: Verify and Share Information

Collect Required Information

Gather these values to send to Valohai:

Subscription and Location:

  • AWS Account ID: ____________

  • Region: ____________

IAM:

  • ValohaiMaster Role ARN: arn:aws:iam::<account-id>:role/ValohaiMaster

Networking:

  • VPC ID: vpc-____________

  • Subnet IDs: subnet-________, subnet-________, ...

Queue Instance:

  • Private IP: ____________

  • Public IP (Elastic IP): ____________

Storage:

  • S3 Bucket Name: valohai-data-<account-id>

Send to Valohai

Email this information to your Valohai contact at [email protected] using your organization's secure communication method (e.g., password-protected document, encrypted email).

Verify Queue Instance

SSH into the queue instance to verify it's running correctly:

You should see the service is active and running. Check logs with:

Troubleshooting

Queue instance not accessible

Check security group rules:

Verify port 63790 is open from 34.248.245.191/32 and 63.34.156.112/32.

Check instance status:

Verify user data script ran:

IAM role issues

Verify trust relationship:

Check the AssumeRolePolicyDocument contains the correct Valohai ARN.

Check instance profile:

S3 access errors

Verify bucket policy and CORS:

Test worker access:

Cannot reach queue from workers

Check network connectivity:

Verify security group:

  • Ensure valohai-sg-queue allows inbound from valohai-sg-workers

Getting Help

Before contacting support, collect:

  • CloudTrail logs for any permission errors

  • Security group configurations

  • IAM role trust relationships

  • Queue instance logs (/var/log/cloud-init-output.log)

Contact: [email protected]

Include in your message:

  • AWS Account ID

  • Region

  • Specific error messages

  • Steps you've already tried

Last updated

Was this helpful?