Self-Hosted on EC2
Deploy a fully self-hosted Valohai installation on AWS EC2 with all components in your environment
Deploy the complete Valohai platform inside your AWS environment. This guide covers installing both the Application Layer and the Compute & Data Layer in your AWS account.
A self-hosted installation allows you to run all components of Valohai inside your own network. Users will access a version of Valohai hosted by you, not app.valohai.com.
Updates to the platform are delivered through Docker images.
Consider hybrid first: Most organizations use hybrid deployment where Valohai manages the application layer. Self-hosted requires additional operational overhead.
What Gets Deployed
The following tables show the default configuration values.
Security Groups
valohai-sg-workers
Worker instances executing ML jobs
SSH from admin IPs (optional, for debugging)
Allow all (or block if ML jobs should not access public internet)
valohai-sg-master
Valohai web application instance
• Port 22 from admin IPs • Port 80 from load balancer
Access to database, queue, and internet
valohai-sg-database
PostgreSQL database
Port 5432 from valohai-sg-master
None
valohai-sg-queue
Redis queue
Port 6379 from valohai-sg-master and valohai-sg-workers
None
valohai-sg-loadbalancer
Application load balancer
Port 443 from 0.0.0.0/0 (HTTPS from users)
Port 80 to valohai-sg-master
Core Services
EC2 Instance
(valohai-roi)
Hosts Valohai web application, deployment image building, and scaling services
• Instance type: m5a.xlarge (4 vCPUs, 16GB RAM) • OS: Ubuntu 22.04 LTS • Storage: 32GB
RDS PostgreSQL
Stores user data and execution metadata (who ran what, when, with which configuration) Does not store actual content (data, code, models)
• Instance class: db.t2.large minimum • Engine: PostgreSQL 14.2
ElastiCache Redis
Job queue and short-term execution log storage
• Node type: cache.m3.xlarge • Engine: Redis 6.2
Application Load Balancer
HTTPS endpoint for user access
• HTTP/2 enabled • Routes traffic to valohai-roi instance
IAM Roles
valohai-master-role
EC2 instance running Valohai web app
• Create and edit autoscaling groups • Launch and terminate EC2 instances for ML jobs • Upload and download files from default S3 storage • Access Valohai secrets from Secrets Manager
valohai-worker-role
EC2 instances running ML jobs
• Set instance protection • Describe its own instance metadata
valohai-multipart-role
Web app for large file uploads
• Multipart upload operations to S3 (for files >5GB)
Storage
S3 Bucket
Artifact and code storage
• Git commit snapshots for reproducibility • Execution logs (moved from Redis after completion) • Input datasets and output artifacts
Other Resources
AWS Secrets Manager
Stores RDS password and Valohai configuration secrets
AWS SSM Parameter Store
Stores configuration details
Prerequisites
From Valohai:
Contact [email protected] to receive:
Docker images for the Valohai application
Required configuration values and permissions
From your AWS account:
Admin access to AWS Console or CLI
VPC (existing or new)
DNS domain name
SSL/TLS certificate for HTTPS
Installation Methods
AWS CDK
Deploy using AWS Cloud Development Kit.
Repository: github.com/valohai/valohai-cdk-self-hosted
Terraform
Deploy using Terraform scripts.
Repository: github.com/valohai/valohai-self-hosted-aws-tf
Manual Setup
Follow the manual deployment steps below for complete control over the installation.
Manual Deployment
VPC
Valohai can be deployed in your existing VPC or in a new separate VPC.
Security Groups
Create the security groups listed in the "What Gets Deployed" section above.
valohai-sg-workers
Inbound: Allow SSH connection for admins for debugging purposes
Outbound: Block outbound access if ML jobs are not allowed to access the public internet
valohai-sg-master
Port 22 from admin IP addresses
Port 80 from valohai-sg-loadbalancer
valohai-sg-database
Port 5432 from valohai-sg-master
valohai-sg-queue
Port 6379 from valohai-sg-master
Port 6379 from valohai-sg-workers
valohai-sg-loadbalancer
Port 443 from 0.0.0.0/0 (all traffic)
IAM Roles
ValohaiWorker - IAM Role
Default role for all EC2 instances launched by Valohai for ML jobs. This is the minimum requirement.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "1",
"Effect": "Allow",
"Action": "autoscaling:SetInstanceProtection",
"Resource": "*"
},
{
"Sid": "2",
"Effect": "Allow",
"Action": "ec2:DescribeInstances",
"Resource": "*"
}
]
}ValohaiMaster - IAM User
Used for creating and scaling EC2 resources for ML jobs launched by users. This user also has access to the Valohai default S3 bucket and can access secrets from AWS Secrets Manager that are tagged with Valohai.
{
"Version" : "2012-10-17",
"Statement" : [
{
"Sid" : "2",
"Effect" : "Allow",
"Action" : [
"ec2:DescribeInstances",
"ec2:DescribeVpcs",
"ec2:DescribeKeyPairs",
"ec2:DescribeImages",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeInstanceAttribute",
"ec2:CreateTags",
"ec2:DescribeInternetGateways",
"ec2:DescribeRouteTables",
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeScalingActivities"
],
"Resource" : "*"
},
{
"Sid" : "AllowUpdatingSpotLaunchTemplates",
"Effect" : "Allow",
"Action" : [
"ec2:CreateLaunchTemplate",
"ec2:CreateLaunchTemplateVersion",
"ec2:ModifyLaunchTemplate",
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:RebootInstances",
"autoscaling:UpdateAutoScalingGroup",
"autoscaling:CreateOrUpdateTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:CreateAutoScalingGroup"
],
"Resource" : "*",
"Condition" : {
"ForAllValues:StringEquals" : {
"aws:ResourceTag/valohai" : "1"
}
}
},
{
"Sid" : "ServiceLinkedRole",
"Effect" : "Allow",
"Action" : "iam:CreateServiceLinkedRole",
"Resource" : "arn:aws:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
},
{
"Sid" : "4",
"Effect" : "Allow",
"Action" : [
"iam:PassRole",
"iam:GetRole"
],
"Resource" : "arn:aws:iam::ACCOUNT-ID:role/ValohaiWorkerRole"
},
{
"Sid" : "0",
"Effect" : "Allow",
"Action" : [
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecretVersionIds"
],
"Resource" : "*",
"Condition" : {
"StringEquals" : {
"secretsmanager:ResourceTag/valohai" : "1"
}
}
},
{
"Action" : "secretsmanager:GetRandomPassword",
"Resource" : "*",
"Effect" : "Allow",
"Sid" : "1"
},
{
"Effect" : "Allow",
"Action" : "s3:*",
"Resource" : [
"arn:aws:s3:::your S3 bucket name",
"arn:aws:s3:::your S3 bucket name/*"
]
}
]
}ValohaiMultiPartUploadRole - IAM Role
Used to upload files over 5GB to S3 bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1503921756000",
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions",
"s3:ListMultipartUploadParts",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::your S3 bucket name",
"arn:aws:s3:::your S3 bucket name/*"
]
}
]
}Core Valohai Resources
The following tables show the minimum configuration needed for Valohai. Your actual requirements may vary based on your organization's needs.
Compute
Name
valohai-roi
Instance type
m5a.xlarge (4 vCPUs, 16GB RAM)
Operating system
Ubuntu 22.04 LTS
Storage
32GB
Security group
valohai-sg-master
Application port
8000 (served via load balancer)
Storage
Name
yourbucketname-valohai
Type
S3 bucket
Public access
Blocked
Database
Name
valohai-psql
Engine
PostgreSQL 14.2
Instance class
db.t2.large
Port
5432
Security group
valohai-sg-database
Public accessibility
No
Queue
Name
valohai-queue
Service
ElastiCache Redis
Engine version
6.2
Node type
cache.m3.xlarge
Number of nodes
1
Load Balancer
Type
Application Load Balancer
Protocol
HTTPS (HTTP/2 enabled)
Backend target
valohai-roi instance on port 8000
DNS
Provide a DNS name to point at the load balancer for web application access (e.g., valohai.yourdomain.net).
Next Steps
After deployment, contact [email protected] to complete the configuration and initialize the application.
Last updated
Was this helpful?
