Preparing your AWS for Valohai self-hosted trial

This document prepares your AWS account for a Valohai self-hosted trial.

Select the correct region

Select the appropriate region for the resources:

  • Consider using the same region where your data is located to reduce data transfer times.

  • Consider using the regions where you’ve already acquired GPU quota from Amazon.

  • When selecting your region, note that regions have different collections of available GPU types.
    • For US customers, we recommend US West 2 (Oregon) as they have the widest array of GPU machine types in the United States.

    • For EU customers, we recommend EU West 1 (Ireland) as it has the widest array of GPU machine types in the Europe.

Setting up a sub account (optional)

You can create a dedicated sub account in AWS for all the Valohai resources. This sub account separates the Valohai resources from all other AWS services you might be using.

Create all the following IAM access control entities in this sub account.

Create a S3 Bucket

Create an S3 bucket through AWS console (https://s3.console.aws.amazon.com/s3/home).

Select bucket name and region

  1. Throughout this guide, we will assume the name of the bucket is valohai-bucket; be sure to replace this with the actual name of your bucket when copying in any example configuration!

  2. Create the bucket in the region you’ll be running your training to minimize data transfer costs. If you don’t have a preference, we recommend using Ireland (eu-west-1) as most of our computation resides there.

Use default bucket properties & permissions

Default bucket properties are fine, but double check that your bucket is not public. You can of course edit the default settings based on your needs.

Create a new bucket.

Creating IAM Entities

As we want to avoid having access to your AWS authentication and authorization, do a couple of configurations under IAM.

IAM Role “ValohaiWorkerRole”

This role is assigned to the individual EC2 instances (machine learning job workers) to query information about themselves and set instance protection to prevent inadvertent termination when processing workloads.

Create a IAM policy Start by creating a policy that we’ll attach to the role:

  • Open the AWS Console and navigate to IAM -> Policies

  • Click on create policy

  • Paste the JSON from below as the new policy

  • Add a tag valohai with value 1

  • Give the policy a name ValohaiWorkerPolicy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Action": "autoscaling:SetInstanceProtection",
            "Resource": "*"
        },
        {
            "Sid": "2",
            "Effect": "Allow",
            "Action": "ec2:DescribeInstances",
            "Resource": "*"
        }
    ]
}

Create a IAM role

  • Open the AWS Console and navigate to IAM -> Roles

  • Create a new role called ValohaiWorkerRole

  • Create role

  • Choose EC2 as the use case

  • Find and attach the ValohaiWorkerPolicy policy

  • Add a tag valohai with value 1

Copy the Role ARN shown for the newly created role. You’ll need this in the next step.

Note: Instance profile

If you use the AWS Management Console to create the ValohaiWorkerRole, the console will automatically create an instance profile and gives it the same name as the role. If you’re using the AWS CLI or APIs to create this role, you’ll need to manually create an instance profile and add the role to it. Read more at [AWS: Using instance profiles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html)

IAM User for ValohaiMaster

These are credentials for the Valohai web application at https://app.valohai.com/ and scaling services to:

  1. be able to see how many Valohai-related instances are running

  2. allow scaling worker clusters up and down

  3. add various launch configurations and auto scaling groups, one for each instance type.

  4. allow the organization admin to adjust max price for spot instances through app.valohai.com

Create a IAM policy

Start by creating a policy that defines permissions for the role that Valohai can assume:

  • Open the AWS Console and navigate to IAM -> Policies

  • Click on Create policy

  • Paste the JSON from below as the new policy

  • Add a tag valohai with value 1

  • Give the policy a name ValohaiMasterPolicy

Important

Make sure you paste your own ValohaiWorkerRole ARN to the last line.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "2",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeVpcs",
                "ec2:DescribeKeyPairs",
                "ec2:DescribeImages",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeLaunchTemplates",
                "ec2:CreateTags",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeScalingActivities"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowUpdatingSpotLaunchTemplates",
            "Effect": "Allow",
            "Action": [
                "ec2:CreateLaunchTemplate",
                "ec2:CreateLaunchTemplateVersion",
                "ec2:ModifyLaunchTemplate",
                "ec2:RunInstances",
                "autoscaling:UpdateAutoScalingGroup",
                "autoscaling:CreateOrUpdateTags",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:CreateAutoScalingGroup"
            ],
            "Resource": "*",
            "Condition": {
                "ForAllValues:StringEquals": {
                    "aws:ResourceTag/Valohai": "1"
                }
            }
        },
        {
            "Sid": "ServiceLinkedRole",
            "Effect": "Allow",
            "Action": "iam:CreateServiceLinkedRole",
            "Resource": "arn:aws:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
        },
        {
            "Sid": "4",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole",
                "iam:GetRole"
            ],
            "Resource": "arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/ValohaiWorkerRole"
        },
        {
            "Sid": "4",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::valohai-bucket",
                "arn:aws:s3:::valohai-bucket/*"
            ]
        }
    ]
}

Create the IAM role

  • Open the AWS Console and navigate to IAM -> Users

  • Create a new user called ValohaiMaster

  • Choose EC2 as the use case

  • Find and attach the ValohaiMasterPolicy policy

  • Add a tag valohai with value 1

You’ll need the access key and secret key during the installation to allow the Valohai application to scale IAM resources in your subscription.

Setting up Valohai resources

Below is a list of the AWS resources that are required for the self-hosted Valohai installation.

Optional: VPC and subnets

You can use your existing VPC or create a new VPC and subnets per each availability zone you want to use. For example:

  • VPC
    • Name: valohai-vpc

    • CIDR: 10.0.0.0/16

  • One subnet per zone. For example
    • Subnet: valohai-subnet-1, 10.0.0.0/20, -

    • Subnet: valohai-subnet-2, 10.0.16.0/20, -

    • Subnet: valohai-subnet-3, 10.0.32.0/20, -

    • Subnet: valohai-subnet-4, 10.0.48.0/20, -

  • Internet Gateway
    • Name: valohai-igw

    • Attach this Internet Gatway to valohai-vpc

  • Routing Table rename the default table to valohai-rt
    • Edit Routes:
      • 10.0.0.0/16 -> local

      • 0.0.0.0/0 => valohai-igw

Security groups

Create a new security group named valohai-sg-workers and set the Inbound rules listed below:

Protocol

Port

Source

Description

TCP

22

3.251.38.215/32 (during installation)

for SSH management from Valohai

Create a new security group named valohai-sg-master and set the Inbound rules listed below:

Protocol

Port

Source

Description

TCP

6379

valohai-sg-workers

for plain Redis connection from workers

TCP

22

3.251.38.215/32 (during installation)

for SSH management from Valohai

TCP

80

0.0.0.0/0

for access to web app

TCP

443

0.0.0.0/0

for access to web app

TCP

8811

0.0.0.0/0

access to hosted notebooks

EC2 Instance for Valohai master

Provision an Elastic IP and a EC2 instance for storing the job quue and short term logs.

  • Elastic IP from the Amazon pool
    • Name: valohai-ip-master

  • EC2 instance works as the master instace for Valohai and will host all the core Valohai services.
    • Name: valohai-i-master

    • OS: Ubuntu 20.04 LTS

    • Machine type: t3.xlarge (4 vCPU, 16GB RAM)

    • Standard persistent disk: 200GB

    • Security Group: valohai-sg-master

    • Key Pair: You’ll receive the key pair from your Valohai contact

    • Tag: Valohai

Attach the Elastic IP to the new VM instance.

Conclusion

You should now have the following details:

  • Region

  • S3 Bucket for Valohai

  • IAM User for ValohaiMaster (inc. Access Key and Secret)

  • IAM Role for ValohaiWorkerRole

  • Name of VPC for Valohai workers

  • Security groups for valohai-sg-master and valohai-sg-workers

  • Names of subnets that can be used for Valohai workers

  • Name of the Key Pair in your AWS

  • Public IP of the EC2 instance for Valohai

  • Private IP of the EC2 instance for Valohai