The Compute and Data Layer of Valohai can be deployed to your Azure Resource Group. This enables you to:
- Use your own Virtual Machines instances to run machine learning jobs.
- Use your own Azure Blob Storage for storing training artifacts such as trained models, preprocessed datasets, visualizations, etc.
- Access databases and data warehouses directly from the workers, which are inside your network.
Valohai doesn’t have direct access to the virtual machine instances that execute the machine learning jobs. Instead, it communicates with a static virtual machine in your resource group that’s responsible for storing the job queue, job states, and short-term logs.
Resource Group
You need to create a resource group to host the Valohai-managed resources.
Navigate to Resource Group Management and select “Add”.
Select the Subscription you’d like the resources to be created within, then name the Resource Group.
If you’re not feeling creative, name the group valohai for simplicity. However, take note of the name, as Valohai engineers will need this.
Also, select the appropriate region for the resources:
- When selecting your region, remember that regions have different collections of available GPU types.
- For US customers, we recommend East US or West US 2 as they have the widest array of GPU machine types in the United States.
- For EU customers, we recommend West Europe as it has the widest array of GPU machine types in Europe.
- Check the Azure product availability page for more details.
- Consider using the same region where your data is located to reduce data transfer times.
- Consider using the regions where you’ve already acquired GPU quota from Microsoft.
Virtual Network
Valohai will need to know which virtual network to use. You can either provide an existing vNet and subnets or create a new one.
To create a new virtual network:
- Go to your resource group and select Add and search for Virtual Network.
- Give it a name (for example valohai-vnet) and select your region.
- You can then either specify specific IP addresses or just proceed with the default configuration by clicking Review + create.
Valohai will spin all the virtual machines used for your machine learning jobs inside this virtual network.
How large should my subnet be?
There is no hard requirement from Valohai. The IP ranges and sizes will depend on your organization’s policy and the number of parallel jobs you need to run. Valohai will have just one static virtual machine with a public IP in the resource group. All other machines (and their resources) will be created and destroyed according to the scaling rules set by the organization admin in Valohai.
Do Valohai machines need outbound internet access (egress)?
It’s strongly recommended to provide outbound network access to the machines.
The static valohai-queue machine will need outbound network access to download assets to operate the job queue.
We have a proxy in place. Can we configure Valohai’s workers to use our proxy?**
Yes. The use of the proxy will need to be set up by Valohai Support. Please contact your Valohai contact with details of your proxy.
Does Valohai need to access the machines in our network?
No. You can set up all the resources yourself and block inbound access (ingress).
Can we use an existing virtual network?
Yes. You’ll just need to provide Valohai Support with details of which virtual network should be used for the workers.
App Registration
Next, create an app registration in your Azure AD to allow Valohai programmatic access to your resource group. This will allow Valohai to create and delete virtual machines that are used for your machine learning jobs. The scope can be limited only to this resource group.
This can be done at the App Registration management panel:
- Click New registration.
- Any name for the application will do – “Valohai” is a good choice.
- The “Supported Account Type” option should be left at “Accounts in this organizational directory only (Your Organization Name Here)”.
- The Redirect URI can be left empty.
Once the App Registration is created, take note of the Application (client) and Directory (tenant) ID values displayed.
Then navigate to the new app registration and select “Certificates & Secrets”, then “New client secret”.
- Any Description will do – “Valohai Secret,” for instance, is fine.
- The Expiry time should preferably be “12 months” or according to your company policy. Make a note of the expiry time as you’ll have to share it with your Valohai contact.
Once the Secret is created, copy the value from the table and make a note of it – this is the only time you’ll be able to see it.
Permissions
Once the App Registration has been created, you will need to grant it access to manage resources.
- Navigate to your resource group.
- Take a note of the subscription ID.
Now select “Access Control (IAM)”. We’ll need to create a new role ValohaiMasterRole:
- Open the Roles tab.
- Click Add custom role.
- Give the role the name ValohaiMasterRole.
- Open the Assignable scopes tab. Make sure you’ve selected the correct resource group(s).
- Open the JSON tab and replace the permissions section with the permissions from below.
- Save your changes.
"permissions": [
{
"actions": [
"Microsoft.Resources/deployments/validate/action",
"Microsoft.Resources/deployments/write",
"Microsoft.Resources/deployments/operationStatuses/read",
"Microsoft.Network/virtualNetworks/subnets/read",
"Microsoft.Network/networkSecurityGroups/read",
"Microsoft.Network/networkSecurityGroups/join/action",
"Microsoft.Network/networkSecurityGroups/write",
"Microsoft.Network/publicIPAddresses/write",
"Microsoft.Network/publicIPAddresses/read",
"Microsoft.Network/publicIPAddresses/delete",
"Microsoft.Network/publicIPAddresses/join/action",
"Microsoft.Network/networkInterfaces/read",
"Microsoft.Network/networkInterfaces/write",
"Microsoft.Network/networkInterfaces/join/action",
"Microsoft.Network/networkInterfaces/delete",
"Microsoft.Network/networkInterfaces/effectiveRouteTable/action",
"Microsoft.Network/networkInterfaces/effectiveNetworkSecurityGroups/action",
"Microsoft.Network/networkInterfaces/UpdateParentNicAttachmentOnElasticNic/action",
"Microsoft.Network/virtualNetworks/subnets/join/action",
"Microsoft.Network/virtualNetworks/subnets/virtualMachines/read",
"Microsoft.Network/networkSecurityGroups/securityRules/write",
"Microsoft.Network/networkSecurityGroups/securityRules/read",
"Microsoft.Network/networkSecurityGroups/securityRules/delete"
],
"notActions": [],
"dataActions": [],
"notDataActions": []
}
]
Next, we’ll assign the role to our service principal.
- On the IAM page, click Add role assignment.
- Search for the ValohaiMasterRole and click next.
- Make sure “User, group, or service principal” is selected and click Select members. Then search for the service principal by writing its name.
- Click Review and assign and save your changes.
Next, add the Virtual Machine Contributor role.
- On the IAM page, click Add role assignment.
- Search for the Virtual Machine Contributor and click next.
- Make sure “User, group, or service principal” is selected and click Select members. Then search for the service principal by writing its name.
- Click Review and assign and save your changes.
A managed identity
We’ll assign a managed user identity to the virtual machine that’s running the Valohai queue. The identity will be used to authenticate the VM with the Key Vault and fetch the saved secrets.
Open Azure Managed Identities.
Create a new user identity:
- Resource Group: The one you created earlier.
- Region: Your selected region.
- Name: valohai-queue.
Key Vault
We’ll store the queue password in the Key Vault.
Open Azure Key Vaults:
- Resource Group: The resources group you selected earlier.
- Key vault name: valohai-queue-key.
- Region: Your region.
- Access Policy:
- Permission Model: Vault access Policy.
- Click on Add Access Policy.
- Secret Permissions: Get.
- Select Principal: valohai-queue.
- Click on Add Access Policy.
- Secret Permissions: Get.
- Select Principal: valohai (This is the app registration name).
A secret
Add a new secret to your newly created Key Vault
- Upload options: Manual.
- Name: ValohaiRedisSecret.
- Value: Generate a secret with letters and numbers (no special characters).
- Create.
Valohai Queue Instance
The Valohai queue instance will handle the job queue. app.valohai.com will submit jobs to the queue, and your workers will read their job information from the queue.
Create a new virtual machine in the region and zone where you created your virtual network.
- Name: valohai-queue.
- Image: Ubuntu Server 20.04 LTS.
- Authentication Type: SSH Key.
- Username: For example “ubuntu”.
- Inbound Port rules: By default, it will allow Port 22 for SSH access, but you can edit this according to your policy.
- Size: B2s (2 vCPUs, 4GB of memory).
- Disk:
- Disk Type: Premium SSD (locally redundant).
- Network:
- Virtual Network: The network you created.
- Subnet: default.
- You can choose another subnet also. Note: This machine will need outbound internet access.
- Public IP: Create new.
- Create the virtual machine.
User assigned identity
Go to the newly created virtual machine resource in the Azure Portal and navigate to the Identity section (under Settings).
- Open the User assigned tab.
- Add a new identity “valohai-queue”.
- Save changes.
Inbound Rules
Open the Network Security Group associated with your instance (e.g. valohai-queue-nsg
).
Navigate to the Inbound security rules.
Add two new rules:
ValohaiApp:
- Source: IP Address.
- Source IP Addresses/CIDR: 34.248.245.191, 63.34.156.112.
- Source Port Ranges: *.
- Destination Port Ranges: 63790.
- Protocol: TCP.
- Name: ValohaiApp.
- Description: Allows app.valohai.com and the Valohai scaling service to access the job queue to submit jobs and fetch job status.
Valohai Certificate:
- Source: Any.
- Destination Port Ranges: 80.
- Protocol: TCP.
- Name: ValohaiLetsEncryptCertificate.
- Description: Allows using LetsEncrypt HTTP challenge to provision a certificate on the machine.
- NOTE: You can also provision your own certificate on the machine and not open port 80. Please connect with Valohai support for details.
SSH into your Virtual Machine
Now that your virtual machine has access to the Key Vault, we can SSH into the machine and complete the Valohai job queue setup.
Note you’ll need to update:
export QUEUE=
to include your queue address (e.g.export QUEUE=myqueue.vqueue.net
). You’ll get the address from your Valohai contact.VAULT_NAME
in the line that starts with PASSWORD. Replace that with your Vault’s name (e.g. valohai-myorg).
export QUEUE=
ACCESS_TOKEN=$(curl 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fvault.azure.net' -H Metadata:true | python3 -c "import sys, json; print(json.load(sys.stdin)['access_token'])" $*)
PASSWORD=$(curl 'https://VAULT_NAME.vault.azure.net/secrets/ValohaiRedisSecret?api-version=2016-10-01' -H "Authorization: Bearer $ACCESS_TOKEN" | python3 -c "import sys, json; print(json.load(sys.stdin)['value'])" $*)
curl https://raw.githubusercontent.com/valohai/worker-queue/main/host/setup.sh | sudo QUEUE_ADDRESS=$QUEUE REDIS_PASSWORD=$PASSWORD bash
unset PASSWORD.
Registering Resource Providers for the Subscription
Registering a resource provider configures your subscription to work with the given resource provider. Essentially registering a provider means “enabling” the related services on your subscription.
Valohai uses the following resource providers:
- Microsoft.Compute
- Microsoft.Network
To verify that the above resource providers are registered:
- Navigate to “Azure Portal > Subscriptions”.
- Select the subscription that will be used for Valohai.
- Navigate to “Resource providers” through the menu on the left.
- Register the following providers if they aren’t already:
- Microsoft.Compute
- Microsoft.Network
Conclusion
You should now have the following values:
- Region
- Subscription ID
- Resource Group Name
- Directory (tenant) ID
- Application (client) ID
- Application Secret
- Application Secret Expiry Date
- Virtual Network name
- Subnet name (optional)
- Private IP of the valohai-queue Virtual Machine
- Public IP of the valohai-queue Virtual Machine
- Name of your Key Vault
Share this information with your Valohai contact using the Vault credentials provided to you.