# Self-Hosted on EC2

Deploy the complete Valohai platform inside your AWS environment. This guide covers installing both the Application Layer and the Compute & Data Layer in your AWS account.

A self-hosted installation allows you to run all components of Valohai inside your own network. Users will access a version of Valohai hosted by you, not app.valohai.com.

Updates to the platform are delivered through Docker images.

> **Consider hybrid first:** Most organizations use [hybrid deployment](https://github.com/valohai/dokuhai/blob/main/hybrid.md) where Valohai manages the application layer. Self-hosted requires additional operational overhead.

## What Gets Deployed

The following tables show the default configuration values.

### Security Groups

<table><thead><tr><th width="141.55078125">Security Group</th><th width="163.046875">Purpose</th><th>Inbound Rules</th><th>Outbound Rules</th></tr></thead><tbody><tr><td><strong>valohai-sg-workers</strong></td><td>Worker instances executing ML jobs</td><td>SSH from admin IPs (optional, for debugging)</td><td>Allow all (or block if ML jobs should not access public internet)</td></tr><tr><td><strong>valohai-sg-master</strong></td><td>Valohai web application instance</td><td>• Port 22 from admin IPs<br>• Port 80 from load balancer</td><td>Access to database, queue, and internet</td></tr><tr><td><strong>valohai-sg-database</strong></td><td>PostgreSQL database</td><td>Port 5432 from valohai-sg-master</td><td>None</td></tr><tr><td><strong>valohai-sg-queue</strong></td><td>Redis queue</td><td>Port 6379 from valohai-sg-master and valohai-sg-workers</td><td>None</td></tr><tr><td><strong>valohai-sg-loadbalancer</strong></td><td>Application load balancer</td><td>Port 443 from 0.0.0.0/0 (HTTPS from users)</td><td>Port 80 to valohai-sg-master</td></tr></tbody></table>

### Core Services <a href="#core-services" id="core-services"></a>

<table><thead><tr><th width="173.58203125">Service</th><th width="272.63671875">Purpose</th><th>Specifications</th></tr></thead><tbody><tr><td><strong>EC2 Instance</strong><br>(<code>valohai-roi</code>)</td><td>Hosts Valohai web application, deployment image building, and scaling services</td><td>• Instance type: m5a.xlarge (4 vCPUs, 16GB RAM)<br>• OS: Ubuntu 22.04 LTS<br>• Storage: 32GB</td></tr><tr><td><strong>RDS PostgreSQL</strong></td><td>Stores user data and execution metadata (who ran what, when, with which configuration)<br><br>Does not store actual content (data, code, models)</td><td>• Instance class: db.t2.large minimum<br>• Engine: PostgreSQL 14.2</td></tr><tr><td><strong>ElastiCache Redis</strong></td><td>Job queue and short-term execution log storage</td><td>• Node type: cache.m3.xlarge<br>• Engine: Redis 6.2</td></tr><tr><td><strong>Application Load Balancer</strong></td><td>HTTPS endpoint for user access</td><td>• HTTP/2 enabled<br>• Routes traffic to valohai-roi instance</td></tr></tbody></table>

### IAM Roles <a href="#iam-roles" id="iam-roles"></a>

<table><thead><tr><th width="133.7265625">Role</th><th width="231.1875">Attached To</th><th>Permissions</th></tr></thead><tbody><tr><td><strong>valohai-master-role</strong></td><td>EC2 instance running Valohai web app</td><td>• Create and edit autoscaling groups<br>• Launch and terminate EC2 instances for ML jobs<br>• Upload and download files from default S3 storage<br>• Access Valohai secrets from Secrets Manager</td></tr><tr><td><strong>valohai-worker-role</strong></td><td>EC2 instances running ML jobs</td><td>• Set instance protection<br>• Describe its own instance metadata</td></tr><tr><td><strong>valohai-multipart-role</strong></td><td>Web app for large file uploads</td><td>• Multipart upload operations to S3 (for files >5GB)</td></tr></tbody></table>

### Storage <a href="#storage" id="storage"></a>

<table><thead><tr><th width="126.2734375">Resource</th><th width="170.26953125">Purpose</th><th>Details</th></tr></thead><tbody><tr><td><strong>S3 Bucket</strong></td><td>Artifact and code storage</td><td>• Git commit snapshots for reproducibility<br>• Execution logs (moved from Redis after completion)<br>• Input datasets and output artifacts</td></tr></tbody></table>

### Other Resources <a href="#other-resources" id="other-resources"></a>

| Resource                    | Purpose                                               |
| --------------------------- | ----------------------------------------------------- |
| **AWS Secrets Manager**     | Stores RDS password and Valohai configuration secrets |
| **AWS SSM Parameter Store** | Stores configuration details                          |

## Prerequisites

**From Valohai:**

Contact **<support@valohai.com>** to receive:

* Docker images for the Valohai application
* Required configuration values and permissions

**From your AWS account:**

* Admin access to AWS Console or CLI
* VPC (existing or new)
* DNS domain name
* SSL/TLS certificate for HTTPS

## Installation Methods

### AWS CDK

Deploy using AWS Cloud Development Kit.

**Repository:** [github.com/valohai/valohai-cdk-self-hosted](https://github.com/valohai/valohai-cdk-self-hosted)

### Terraform

Deploy using Terraform scripts.

**Repository:** [github.com/valohai/valohai-self-hosted-aws-tf](https://github.com/valohai/valohai-self-hosted-aws-tf)

### Manual Setup

Follow the manual deployment steps below for complete control over the installation.

## Manual Deployment

### VPC

Valohai can be deployed in your existing VPC or in a new separate VPC.

### Security Groups

Create the security groups listed in the "What Gets Deployed" section above.

#### valohai-sg-workers

* Inbound: Allow SSH connection for admins for debugging purposes
* Outbound: Block outbound access if ML jobs are not allowed to access the public internet

#### valohai-sg-master

* Port 22 from admin IP addresses
* Port 80 from valohai-sg-loadbalancer

#### valohai-sg-database

* Port 5432 from valohai-sg-master

#### valohai-sg-queue

* Port 6379 from valohai-sg-master
* Port 6379 from valohai-sg-workers

#### valohai-sg-loadbalancer

* Port 443 from 0.0.0.0/0 (all traffic)

### IAM Roles

#### ValohaiWorker - IAM Role

Default role for all EC2 instances launched by Valohai for ML jobs. This is the minimum requirement.

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "1",
      "Effect": "Allow",
      "Action": "autoscaling:SetInstanceProtection",
      "Resource": "*"
    },
    {
      "Sid": "2",
      "Effect": "Allow",
      "Action": "ec2:DescribeInstances",
      "Resource": "*"
    }
  ]
}
```

#### ValohaiMaster - IAM User

Used for creating and scaling EC2 resources for ML jobs launched by users. This user also has access to the Valohai default S3 bucket and can access secrets from AWS Secrets Manager that are tagged with Valohai.

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "2",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeVpcs",
        "ec2:DescribeKeyPairs",
        "ec2:DescribeImages",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:DescribeInstanceAttribute",
        "ec2:CreateTags",
        "ec2:DescribeInternetGateways",
        "ec2:DescribeRouteTables",
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeScalingActivities"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AllowUpdatingSpotLaunchTemplates",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateLaunchTemplate",
        "ec2:CreateLaunchTemplateVersion",
        "ec2:ModifyLaunchTemplate",
        "ec2:RunInstances",
        "ec2:TerminateInstances",
        "ec2:RebootInstances",
        "autoscaling:UpdateAutoScalingGroup",
        "autoscaling:CreateOrUpdateTags",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:CreateAutoScalingGroup"
      ],
      "Resource": "*",
      "Condition": {
        "ForAllValues:StringEquals": {
          "aws:ResourceTag/valohai": "1"
        }
      }
    },
    {
      "Sid": "ServiceLinkedRole",
      "Effect": "Allow",
      "Action": "iam:CreateServiceLinkedRole",
      "Resource": "arn:aws:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
    },
    {
      "Sid": "4",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole",
        "iam:GetRole"
      ],
      "Resource": "arn:aws:iam::ACCOUNT-ID:role/ValohaiWorkerRole"
    },
    {
      "Sid": "0",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetResourcePolicy",
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret",
        "secretsmanager:ListSecretVersionIds"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "secretsmanager:ResourceTag/valohai": "1"
        }
      }
    },
    {
      "Action": "secretsmanager:GetRandomPassword",
      "Resource": "*",
      "Effect": "Allow",
      "Sid": "1"
    },
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::your S3 bucket name",
        "arn:aws:s3:::your S3 bucket name/*"
      ]
    }
  ]
}
```

#### ValohaiMultiPartUploadRole - IAM Role

Used to upload files over 5GB to S3 bucket.

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1503921756000",
      "Effect": "Allow",
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions",
        "s3:ListMultipartUploadParts",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::your S3 bucket name",
        "arn:aws:s3:::your S3 bucket name/*"
      ]
    }
  ]
}
```

### Core Valohai Resources

The following tables show the minimum configuration needed for Valohai. Your actual requirements may vary based on your organization's needs.

#### Compute <a href="#compute" id="compute"></a>

<table><thead><tr><th width="213.09765625">Resource</th><th>Configuration</th></tr></thead><tbody><tr><td><strong>Name</strong></td><td>valohai-roi</td></tr><tr><td><strong>Instance type</strong></td><td>m5a.xlarge (4 vCPUs, 16GB RAM)</td></tr><tr><td><strong>Operating system</strong></td><td>Ubuntu 22.04 LTS</td></tr><tr><td><strong>Storage</strong></td><td>32GB</td></tr><tr><td><strong>Security group</strong></td><td>valohai-sg-master</td></tr><tr><td><strong>Application port</strong></td><td>8000 (served via load balancer)</td></tr></tbody></table>

#### Storage <a href="#storage" id="storage"></a>

<table><thead><tr><th width="208.28515625">Resource</th><th>Configuration</th></tr></thead><tbody><tr><td><strong>Name</strong></td><td>yourbucketname-valohai</td></tr><tr><td><strong>Type</strong></td><td>S3 bucket</td></tr><tr><td><strong>Public access</strong></td><td>Blocked</td></tr></tbody></table>

#### Database <a href="#database" id="database"></a>

<table><thead><tr><th width="208.7265625">Resource</th><th>Configuration</th></tr></thead><tbody><tr><td><strong>Name</strong></td><td>valohai-psql</td></tr><tr><td><strong>Engine</strong></td><td>PostgreSQL 14.2</td></tr><tr><td><strong>Instance class</strong></td><td>db.t2.large</td></tr><tr><td><strong>Port</strong></td><td>5432</td></tr><tr><td><strong>Security group</strong></td><td>valohai-sg-database</td></tr><tr><td><strong>Public accessibility</strong></td><td>No</td></tr></tbody></table>

### Queue <a href="#queue" id="queue"></a>

<table><thead><tr><th width="204.5703125">Resource</th><th>Configuration</th></tr></thead><tbody><tr><td><strong>Name</strong></td><td>valohai-queue</td></tr><tr><td><strong>Service</strong></td><td>ElastiCache Redis</td></tr><tr><td><strong>Engine version</strong></td><td>6.2</td></tr><tr><td><strong>Node type</strong></td><td>cache.m3.xlarge</td></tr><tr><td><strong>Number of nodes</strong></td><td>1</td></tr></tbody></table>

### Load Balancer <a href="#load-balancer" id="load-balancer"></a>

<table><thead><tr><th width="199.42578125">Resource</th><th>Configuration</th></tr></thead><tbody><tr><td><strong>Type</strong></td><td>Application Load Balancer</td></tr><tr><td><strong>Protocol</strong></td><td>HTTPS (HTTP/2 enabled)</td></tr><tr><td><strong>Backend target</strong></td><td>valohai-roi instance on port 8000</td></tr></tbody></table>

### DNS <a href="#dns" id="dns"></a>

Provide a DNS name to point at the load balancer for web application access (e.g., `valohai.yourdomain.net`).

## Next Steps

After deployment, contact <support@valohai.com> to complete the configuration and initialize the application.
