Tip
Valohai agents can be installed on an on-premises machine running Linux, preferably Ubuntu 22.04.
Deploying the Valohai Compute and Data Layer
The Compute and Data Layer of Valohai can be deployed to your on-premise environment. This enables you to:
- Use your own on-premises machines to run machine learning jobs.
- Use your own cloud storage for storing training artifacts, like trained models, preprocessed datasets, visualizations, etc.
- Mount local data to your on-premises workers.
- Access databases and data warehouses directly from the workers, which are inside your network.
Valohai doesn’t have direct access to the on-premises machine that executes the machine learning jobs. Instead, it communicates with a separate static virtual machine in your on-premise environment that’s responsible for storing the job queue, job states, and short-term logs.
Installing the Valohai Worker (Peon)
The Valohai agent (Peon) is responsible for fetching new jobs, writing logs, and updating the job states for Valohai.
You’ll need to have Python 3.8+ installed on the machines by default. The peon-bringup (bup) will install other dependencies, like Docker, and if needed, NVIDIA-Docker.
Requirements
Before running the template, you’ll need the following information from Valohai:
- name: the queue name that this on-premises machine will use.
- queue-address: will be assigned to the queue in your subscription.
- redis-password: that your queue uses. This is usually stored in your cloud provider’s Secret Manager.
- url: download URL for the Valohai worker installer.
What’s a queue name
The queue name is a name that you define to add that instance to a queue group. For example:
- myorg-onprem-1
- myorg-onprem-machine-name
- myorg-onprem-gpus
- myorg-onprem-gpus-prod
Each machine can have its own queue, but we recommended using the same queue name on all machines that have the same configuration and are used for the same purpose.
Setup script
sudo su
apt-get update -y && apt-get install -y python3 python3-distutils
TEMPDIR=$(mktemp -d)
pushd $TEMPDIR
export NAME=<queue-name>
export QUEUE_ADDRESS=<queue-address>
export PASSWORD=<redis-password>
export URL=<bup-url>
curl $URL --output bup.pex
chmod u+x bup.pex
env "CLOUD=none" "ALLOW_MOUNTS=true" "INSTALLATION_TYPE=private-worker" "REDIS_URL=rediss://:$PASSWORD@$QUEUE_ADDRESS:63790" 'PEON_EXTRA_CONFIG={"ALLOW_MOUNTS":"true"}' "QUEUES=$NAME" ./bup.pex
popd