Self-Hosted on EC2

Deploy a fully self-hosted Valohai installation on AWS EC2 with all components in your environment

Deploy the complete Valohai platform inside your AWS environment. This guide covers installing both the Application Layer and the Compute & Data Layer in your AWS account.

A self-hosted installation allows you to run all components of Valohai inside your own network. Users will access a version of Valohai hosted by you, not app.valohai.com.

Updates to the platform are delivered through Docker images.

Consider hybrid first: Most organizations use hybrid deployment where Valohai manages the application layer. Self-hosted requires additional operational overhead.

What Gets Deployed

The following tables show the default configuration values.

Security Groups

Security Group

Purpose

Inbound Rules

Outbound Rules

valohai-sg-workers

Worker instances executing ML jobs

SSH from admin IPs (optional, for debugging)

Allow all (or block if ML jobs should not access public internet)

valohai-sg-master

Valohai web application instance

• Port 22 from admin IPs • Port 80 from load balancer

Access to database, queue, and internet

valohai-sg-database

PostgreSQL database

Port 5432 from valohai-sg-master

None

valohai-sg-queue

Redis queue

Port 6379 from valohai-sg-master and valohai-sg-workers

None

valohai-sg-loadbalancer

Application load balancer

Port 443 from 0.0.0.0/0 (HTTPS from users)

Port 80 to valohai-sg-master

Core Services

Service

Purpose

Specifications

EC2 Instance (valohai-roi)

Hosts Valohai web application, deployment image building, and scaling services

• Instance type: m5a.xlarge (4 vCPUs, 16GB RAM) • OS: Ubuntu 22.04 LTS • Storage: 32GB

RDS PostgreSQL

Stores user data and execution metadata (who ran what, when, with which configuration) Does not store actual content (data, code, models)

• Instance class: db.t2.large minimum • Engine: PostgreSQL 14.2

ElastiCache Redis

Job queue and short-term execution log storage

• Node type: cache.m3.xlarge • Engine: Redis 6.2

Application Load Balancer

HTTPS endpoint for user access

• HTTP/2 enabled • Routes traffic to valohai-roi instance

IAM Roles

Role

Attached To

Permissions

valohai-master-role

EC2 instance running Valohai web app

• Create and edit autoscaling groups • Launch and terminate EC2 instances for ML jobs • Upload and download files from default S3 storage • Access Valohai secrets from Secrets Manager

valohai-worker-role

EC2 instances running ML jobs

• Set instance protection • Describe its own instance metadata

valohai-multipart-role

Web app for large file uploads

• Multipart upload operations to S3 (for files >5GB)

Storage

Resource

Purpose

Details

S3 Bucket

Artifact and code storage

• Git commit snapshots for reproducibility • Execution logs (moved from Redis after completion) • Input datasets and output artifacts

Other Resources

Resource

Purpose

AWS Secrets Manager

Stores RDS password and Valohai configuration secrets

AWS SSM Parameter Store

Stores configuration details

Prerequisites

From Valohai:

Contact [email protected] to receive:

Docker images for the Valohai application
Required configuration values and permissions

From your AWS account:

Admin access to AWS Console or CLI
VPC (existing or new)
DNS domain name
SSL/TLS certificate for HTTPS

Installation Methods

AWS CDK

Deploy using AWS Cloud Development Kit.

Repository: github.com/valohai/valohai-cdk-self-hosted

Terraform

Deploy using Terraform scripts.

Repository: github.com/valohai/valohai-self-hosted-aws-tf

Manual Setup

Follow the manual deployment steps below for complete control over the installation.

Manual Deployment

VPC

Valohai can be deployed in your existing VPC or in a new separate VPC.

Security Groups

Create the security groups listed in the "What Gets Deployed" section above.

valohai-sg-workers

Inbound: Allow SSH connection for admins for debugging purposes
Outbound: Block outbound access if ML jobs are not allowed to access the public internet

valohai-sg-master

Port 22 from admin IP addresses
Port 80 from valohai-sg-loadbalancer

valohai-sg-database

Port 5432 from valohai-sg-master

valohai-sg-queue

Port 6379 from valohai-sg-master
Port 6379 from valohai-sg-workers

valohai-sg-loadbalancer

Port 443 from 0.0.0.0/0 (all traffic)

IAM Roles

ValohaiWorker - IAM Role

Default role for all EC2 instances launched by Valohai for ML jobs. This is the minimum requirement.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Action": "autoscaling:SetInstanceProtection",
            "Resource": "*"
        },
        {
            "Sid": "2",
            "Effect": "Allow",
            "Action": "ec2:DescribeInstances",
            "Resource": "*"
        }
    ]
}

ValohaiMaster - IAM User

Used for creating and scaling EC2 resources for ML jobs launched by users. This user also has access to the Valohai default S3 bucket and can access secrets from AWS Secrets Manager that are tagged with Valohai.

{
   "Version" : "2012-10-17",
   "Statement" : [
     {
       "Sid" : "2",
       "Effect" : "Allow",
       "Action" : [
         "ec2:DescribeInstances",
         "ec2:DescribeVpcs",
         "ec2:DescribeKeyPairs",
         "ec2:DescribeImages",
         "ec2:DescribeSecurityGroups",
         "ec2:DescribeSubnets",
         "ec2:DescribeInstanceTypes",
         "ec2:DescribeLaunchTemplates",
         "ec2:DescribeLaunchTemplateVersions",
         "ec2:DescribeInstanceAttribute",
         "ec2:CreateTags",
         "ec2:DescribeInternetGateways",
         "ec2:DescribeRouteTables",
         "autoscaling:DescribeAutoScalingGroups",
         "autoscaling:DescribeScalingActivities"
       ],
       "Resource" : "*"
     },
     {
       "Sid" : "AllowUpdatingSpotLaunchTemplates",
       "Effect" : "Allow",
       "Action" : [
         "ec2:CreateLaunchTemplate",
         "ec2:CreateLaunchTemplateVersion",
         "ec2:ModifyLaunchTemplate",
         "ec2:RunInstances",
         "ec2:TerminateInstances",
         "ec2:RebootInstances",
         "autoscaling:UpdateAutoScalingGroup",
         "autoscaling:CreateOrUpdateTags",
         "autoscaling:SetDesiredCapacity",
         "autoscaling:CreateAutoScalingGroup"
       ],
       "Resource" : "*",
       "Condition" : {
         "ForAllValues:StringEquals" : {
           "aws:ResourceTag/valohai" : "1"
         }
       }
     },
     {
       "Sid" : "ServiceLinkedRole",
       "Effect" : "Allow",
       "Action" : "iam:CreateServiceLinkedRole",
       "Resource" : "arn:aws:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
     },
     {
       "Sid" : "4",
       "Effect" : "Allow",
       "Action" : [
         "iam:PassRole",
         "iam:GetRole"
       ],
       "Resource" : "arn:aws:iam::ACCOUNT-ID:role/ValohaiWorkerRole"
     },
     {
       "Sid" : "0",
       "Effect" : "Allow",
    
       "Action" : [
         "secretsmanager:GetResourcePolicy",
         "secretsmanager:GetSecretValue",
         "secretsmanager:DescribeSecret",
         "secretsmanager:ListSecretVersionIds"
       ],
       "Resource" : "*",
       "Condition" : {
         "StringEquals" : {
           "secretsmanager:ResourceTag/valohai" : "1"
         }
       }
     },
     {
       "Action" : "secretsmanager:GetRandomPassword",
       "Resource" : "*",
       "Effect" : "Allow",
       "Sid" : "1"
     },
     {
       "Effect" : "Allow",
       "Action" : "s3:*",
       "Resource" : [
         "arn:aws:s3:::your S3 bucket name",
         "arn:aws:s3:::your S3 bucket name/*"
       ]
     }
   ]
 }

ValohaiMultiPartUploadRole - IAM Role

Used to upload files over 5GB to S3 bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1503921756000",
            "Effect": "Allow",
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucketVersions",
                "s3:ListMultipartUploadParts",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::your S3 bucket name",
                "arn:aws:s3:::your S3 bucket name/*"
            ]
        }
    ]
}

Core Valohai Resources

The following tables show the minimum configuration needed for Valohai. Your actual requirements may vary based on your organization's needs.

Compute

Resource

Configuration

Name

valohai-roi

Instance type

m5a.xlarge (4 vCPUs, 16GB RAM)

Operating system

Ubuntu 22.04 LTS

Storage

32GB

Security group

valohai-sg-master

Application port

8000 (served via load balancer)

Storage

Resource

Configuration

Name

yourbucketname-valohai

Type

S3 bucket

Public access

Blocked

Database

Resource

Configuration

Name

valohai-psql

Engine

PostgreSQL 14.2

Instance class

db.t2.large

Port

5432

Security group

valohai-sg-database

Public accessibility

Queue

Resource

Configuration

Name

valohai-queue

Service

ElastiCache Redis

Engine version

6.2

Node type

cache.m3.xlarge

Number of nodes

Load Balancer

Resource

Configuration

Type

Application Load Balancer

Protocol

HTTPS (HTTP/2 enabled)

Backend target

valohai-roi instance on port 8000

DNS

Provide a DNS name to point at the load balancer for web application access (e.g., valohai.yourdomain.net).

Next Steps

After deployment, contact [email protected] to complete the configuration and initialize the application.

PreviousHybrid Deployment - Manual Setup NextSelf-Hosted on EKS

Last updated 20 days ago

Was this helpful?