Self-Hosted on EC2

Deploy a fully self-hosted Valohai installation on AWS EC2 with all components in your environment

Deploy the complete Valohai platform inside your AWS environment. This guide covers installing both the Application Layer and the Compute & Data Layer in your AWS account.

A self-hosted installation allows you to run all components of Valohai inside your own network. Users will access a version of Valohai hosted by you, not app.valohai.com.

Updates to the platform are delivered through Docker images.

Consider hybrid first: Most organizations use hybrid deployment where Valohai manages the application layer. Self-hosted requires additional operational overhead.

What Gets Deployed

The following tables show the default configuration values.

Security Groups

Security Group
Purpose
Inbound Rules
Outbound Rules

valohai-sg-workers

Worker instances executing ML jobs

SSH from admin IPs (optional, for debugging)

Allow all (or block if ML jobs should not access public internet)

valohai-sg-master

Valohai web application instance

• Port 22 from admin IPs • Port 80 from load balancer

Access to database, queue, and internet

valohai-sg-database

PostgreSQL database

Port 5432 from valohai-sg-master

None

valohai-sg-queue

Redis queue

Port 6379 from valohai-sg-master and valohai-sg-workers

None

valohai-sg-loadbalancer

Application load balancer

Port 443 from 0.0.0.0/0 (HTTPS from users)

Port 80 to valohai-sg-master

Core Services

Service
Purpose
Specifications

EC2 Instance (valohai-roi)

Hosts Valohai web application, deployment image building, and scaling services

• Instance type: m5a.xlarge (4 vCPUs, 16GB RAM) • OS: Ubuntu 22.04 LTS • Storage: 32GB

RDS PostgreSQL

Stores user data and execution metadata (who ran what, when, with which configuration) Does not store actual content (data, code, models)

• Instance class: db.t2.large minimum • Engine: PostgreSQL 14.2

ElastiCache Redis

Job queue and short-term execution log storage

• Node type: cache.m3.xlarge • Engine: Redis 6.2

Application Load Balancer

HTTPS endpoint for user access

• HTTP/2 enabled • Routes traffic to valohai-roi instance

IAM Roles

Role
Attached To
Permissions

valohai-master-role

EC2 instance running Valohai web app

• Create and edit autoscaling groups • Launch and terminate EC2 instances for ML jobs • Upload and download files from default S3 storage • Access Valohai secrets from Secrets Manager

valohai-worker-role

EC2 instances running ML jobs

• Set instance protection • Describe its own instance metadata

valohai-multipart-role

Web app for large file uploads

• Multipart upload operations to S3 (for files >5GB)

Storage

Resource
Purpose
Details

S3 Bucket

Artifact and code storage

• Git commit snapshots for reproducibility • Execution logs (moved from Redis after completion) • Input datasets and output artifacts

Other Resources

Resource
Purpose

AWS Secrets Manager

Stores RDS password and Valohai configuration secrets

AWS SSM Parameter Store

Stores configuration details

Prerequisites

From Valohai:

Contact [email protected] to receive:

  • Docker images for the Valohai application

  • Required configuration values and permissions

From your AWS account:

  • Admin access to AWS Console or CLI

  • VPC (existing or new)

  • DNS domain name

  • SSL/TLS certificate for HTTPS

Installation Methods

AWS CDK

Deploy using AWS Cloud Development Kit.

Repository: github.com/valohai/valohai-cdk-self-hosted

Terraform

Deploy using Terraform scripts.

Repository: github.com/valohai/valohai-self-hosted-aws-tf

Manual Setup

Follow the manual deployment steps below for complete control over the installation.

Manual Deployment

VPC

Valohai can be deployed in your existing VPC or in a new separate VPC.

Security Groups

Create the security groups listed in the "What Gets Deployed" section above.

valohai-sg-workers

  • Inbound: Allow SSH connection for admins for debugging purposes

  • Outbound: Block outbound access if ML jobs are not allowed to access the public internet

valohai-sg-master

  • Port 22 from admin IP addresses

  • Port 80 from valohai-sg-loadbalancer

valohai-sg-database

  • Port 5432 from valohai-sg-master

valohai-sg-queue

  • Port 6379 from valohai-sg-master

  • Port 6379 from valohai-sg-workers

valohai-sg-loadbalancer

  • Port 443 from 0.0.0.0/0 (all traffic)

IAM Roles

ValohaiWorker - IAM Role

Default role for all EC2 instances launched by Valohai for ML jobs. This is the minimum requirement.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Action": "autoscaling:SetInstanceProtection",
            "Resource": "*"
        },
        {
            "Sid": "2",
            "Effect": "Allow",
            "Action": "ec2:DescribeInstances",
            "Resource": "*"
        }
    ]
}

ValohaiMaster - IAM User

Used for creating and scaling EC2 resources for ML jobs launched by users. This user also has access to the Valohai default S3 bucket and can access secrets from AWS Secrets Manager that are tagged with Valohai.

{
   "Version" : "2012-10-17",
   "Statement" : [
     {
       "Sid" : "2",
       "Effect" : "Allow",
       "Action" : [
         "ec2:DescribeInstances",
         "ec2:DescribeVpcs",
         "ec2:DescribeKeyPairs",
         "ec2:DescribeImages",
         "ec2:DescribeSecurityGroups",
         "ec2:DescribeSubnets",
         "ec2:DescribeInstanceTypes",
         "ec2:DescribeLaunchTemplates",
         "ec2:DescribeLaunchTemplateVersions",
         "ec2:DescribeInstanceAttribute",
         "ec2:CreateTags",
         "ec2:DescribeInternetGateways",
         "ec2:DescribeRouteTables",
         "autoscaling:DescribeAutoScalingGroups",
         "autoscaling:DescribeScalingActivities"
       ],
       "Resource" : "*"
     },
     {
       "Sid" : "AllowUpdatingSpotLaunchTemplates",
       "Effect" : "Allow",
       "Action" : [
         "ec2:CreateLaunchTemplate",
         "ec2:CreateLaunchTemplateVersion",
         "ec2:ModifyLaunchTemplate",
         "ec2:RunInstances",
         "ec2:TerminateInstances",
         "ec2:RebootInstances",
         "autoscaling:UpdateAutoScalingGroup",
         "autoscaling:CreateOrUpdateTags",
         "autoscaling:SetDesiredCapacity",
         "autoscaling:CreateAutoScalingGroup"
       ],
       "Resource" : "*",
       "Condition" : {
         "ForAllValues:StringEquals" : {
           "aws:ResourceTag/valohai" : "1"
         }
       }
     },
     {
       "Sid" : "ServiceLinkedRole",
       "Effect" : "Allow",
       "Action" : "iam:CreateServiceLinkedRole",
       "Resource" : "arn:aws:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
     },
     {
       "Sid" : "4",
       "Effect" : "Allow",
       "Action" : [
         "iam:PassRole",
         "iam:GetRole"
       ],
       "Resource" : "arn:aws:iam::ACCOUNT-ID:role/ValohaiWorkerRole"
     },
     {
       "Sid" : "0",
       "Effect" : "Allow",
    
       "Action" : [
         "secretsmanager:GetResourcePolicy",
         "secretsmanager:GetSecretValue",
         "secretsmanager:DescribeSecret",
         "secretsmanager:ListSecretVersionIds"
       ],
       "Resource" : "*",
       "Condition" : {
         "StringEquals" : {
           "secretsmanager:ResourceTag/valohai" : "1"
         }
       }
     },
     {
       "Action" : "secretsmanager:GetRandomPassword",
       "Resource" : "*",
       "Effect" : "Allow",
       "Sid" : "1"
     },
     {
       "Effect" : "Allow",
       "Action" : "s3:*",
       "Resource" : [
         "arn:aws:s3:::your S3 bucket name",
         "arn:aws:s3:::your S3 bucket name/*"
       ]
     }
   ]
 }

ValohaiMultiPartUploadRole - IAM Role

Used to upload files over 5GB to S3 bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1503921756000",
            "Effect": "Allow",
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucketVersions",
                "s3:ListMultipartUploadParts",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::your S3 bucket name",
                "arn:aws:s3:::your S3 bucket name/*"
            ]
        }
    ]
}

Core Valohai Resources

The following tables show the minimum configuration needed for Valohai. Your actual requirements may vary based on your organization's needs.

Compute

Resource
Configuration

Name

valohai-roi

Instance type

m5a.xlarge (4 vCPUs, 16GB RAM)

Operating system

Ubuntu 22.04 LTS

Storage

32GB

Security group

valohai-sg-master

Application port

8000 (served via load balancer)

Storage

Resource
Configuration

Name

yourbucketname-valohai

Type

S3 bucket

Public access

Blocked

Database

Resource
Configuration

Name

valohai-psql

Engine

PostgreSQL 14.2

Instance class

db.t2.large

Port

5432

Security group

valohai-sg-database

Public accessibility

No

Queue

Resource
Configuration

Name

valohai-queue

Service

ElastiCache Redis

Engine version

6.2

Node type

cache.m3.xlarge

Number of nodes

1

Load Balancer

Resource
Configuration

Type

Application Load Balancer

Protocol

HTTPS (HTTP/2 enabled)

Backend target

valohai-roi instance on port 8000

DNS

Provide a DNS name to point at the load balancer for web application access (e.g., valohai.yourdomain.net).

Next Steps

After deployment, contact [email protected] to complete the configuration and initialize the application.

Last updated

Was this helpful?