Valohai Hybrid
The Valohai hybrid model is the most popular installation mode. It keeps your data and compute securely in your cloud without having to host and manage the core Valohai components yourself.
The Compute and Data Layer of Valohai can be deployed to your AWS Account. This enables you to:
- Use your own EC2 instances to run machine learning jobs.
- Use your own S3 Buckets for storing training artifacts such as trained models, preprocessed datasets, visualizations, etc.
- Access databases and data warehouses directly from the workers, which are inside your network.
Valohai doesn’t have direct access to the EC2 instances that execute the machine learning jobs. Instead, it communicates with a static EC2 instance in your AWS subscription that’s responsible for storing the job queue, job states, and short-term logs.
What will get deployed?
This template is designed to provision the required services in a fresh AWS Account. The following services will be deployed:
- VPC and Subnets in the selected region. Valohai will also deploy a Internet Gateway and RouteTables.
- Two security groups for Valohai resources:
-
valohai-sg-workers
that all the Valohai autoscaled EC2 instances will use.- By default it doesn’t have ports open. You’ll have to open ports to allow for example connecting over SSH to the instances.
-
valohai-sg-queue
for thevalohai-queue
EC2 instance.- It will allow app.valohai.com to connect to Redis (over TLS) on port 63790.
- Allow the autoscaled Valohai workers to connect to Redis on port 63790.
- Open port 80 for the Let’s Encrypt challenge and certificate renewal.
-
EC2 instance (
valohai-queue
) that’s responsible for storing the job queue, job states, and short-term logs. Valohai communicates with this machines (Redis over TLS) to schedule new jobs and access the logs of existing jobs. - You’ll need to provide a key pair that can be uploaded to your AWS account for connecting to this instance.
-
The machine will also have an Elastic IP attached to it.
-
A secret stored in your AWS Secrets Manager. The secret
valohai_redis_server
contains the password for Redis that’s located inside in yourvalohai-queue
instance. - S3 Bucket where Valohai will upload logs from your executions and commit snapshots. All the generated artefacts will be uploaded to this bucket by default.
- IAM Roles
-
ValohaiQueueRole
will be attached to the Valohai Queue instance, and allows it to fetch the generated password from your AWS Secrets Manager. Access is restricted to secrets that are taggedvalohai:1
-
ValohaiWorkerRole
is attached to all autoscaled EC2 instances that are launched for machine learning jobs. -
ValohaiMaster
is the role that the Valohai service will use to manage autoscaling and EC2 resources. The role is also used to manage the newly provisionedvalohai-data-*
S3 Bucket.
Deploying resources
You have the option to choose a CloudFormation template, or a Terraform template to deploy your Valohai Hybrid Environment.
CloudFormation Template
Requirements
Before you can deploy Valohai to your environment, you’ll need to get the AssumeRoleARN
and QueueAddress
from Valohai support.
The current version of these CloudFormation templates can be deployed from:
# Deploy first this stack
https://valohai-cfn-templates-public.s3.eu-west-1.amazonaws.com/iam.yml
# This will generate a role ValohaiMaster. Make a note of it's ARN
# Run this after you've deployed the IAM stack
# You'll get the value for ValohaiMasterRoleARN from your previous Stack
https://valohai-cfn-templates-public.s3.eu-west-1.amazonaws.com/aws-hybrid-workers.yml
Before running the template you’ll need the following information from Valohai:
-
AssumeRoleARN
is the ARN of the user Valohai will use to assume a role in your AWS subscription to manage EC2 instances. -
QueueAddress
will be assigned for the queue in your subscription.
You will also need to generate a EC2 Key Pair in your AWS Console before creating a stack. This key will be used as the default SSH key for all Valohai created resources. Valohai does not need SSH access to your machines. This is just for your use.
You’ll find the CloudFormation templates in our public GitHub repository.
Terraform Template
Requirements
Before you can deploy Valohai to your environment, you’ll need to get the valohai_assume_user
and queue_address
from Valohai support.
-
valohai_assume_user
is the ARN of the user Valohai will use to assume a role in your AWS account to manage EC2 instances. -
queue_address
will be assigned for the queue in your account.
You’ll find the Terraform scripts in our public GitHub repository.