Linux Workers
Deploy Valohai workers on your on-premises Linux servers to run machine learning jobs on your own hardware.
Overview
The Compute and Data Layer of Valohai can be deployed to your on-premise environment. This enables you to:
Use your own on-premises machines to run machine learning jobs
Use your own cloud storage for storing training artifacts (trained models, preprocessed datasets, visualizations)
Mount local data to your on-premises workers
Access databases and data warehouses directly from workers inside your network
Valohai doesn't have direct access to on-premises machines that execute ML jobs. Instead, it communicates with a separate static virtual machine in your on-premise environment that's responsible for storing the job queue, job states, and short-term logs.

Prerequisites
Hardware requirements:
Linux server (Ubuntu 24.04 recommended)
Python 3.10+ installed
For GPU workloads: NVIDIA drivers and NVIDIA Container Toolkit installed
From Valohai:
Contact [email protected] to receive:
queue-name- Name for this worker or group of workersqueue-address- Address of your job queueredis-password- Password for the queueurl- Download URL for the Valohai worker installer
❗Let us know if you wish to use this queue/worker in a Dispatch mode, as it requires special configuration.
Network requirements:
Worker can connect to the queue machine on port 63790
Worker can access your object storage (S3, Azure Blob, GCS)
Optional: Outbound internet access for pulling Docker images
Understanding Queue Names
The queue name identifies this worker or group of workers in Valohai.
Examples:
myorg-onprem-1myorg-onprem-machine-namemyorg-onprem-gpusmyorg-onprem-gpus-prod
Each machine can have its own queue, but we recommend using the same queue name on all machines that have the same configuration and are used for the same purpose.
Installation Methods
Choose your installation method based on your operating system and preferences.
Ubuntu Installer (Recommended)
Automated installer for Ubuntu systems.
What it installs:
Valohai agent (Peon)
Docker (if not already installed)
NVIDIA Container Toolkit (if needed for GPU workloads)
System service configuration
Warning: Only use on fresh, dedicated machines. This will reinstall Docker and NVIDIA Container Toolkit, breaking any existing container workloads. Follow the manual installation steps if you want more control.
Installation:
Replace the placeholder values with the information from Valohai.
After installation, the Valohai agent will start automatically and begin pulling jobs from the queue.
💡Setting the
"PEON_WARDEN_ENABLED=true"enables monitoring the status of the agent in the Valohai UI. If you want this disabled, you can set the value tofalse.
Manual Installation
For non-Ubuntu systems or custom configurations.
Manual Installation Steps
Step 1: Install Dependencies
Python 3.10+
Verify Python is installed:
Docker
Install Docker for your Linux distribution. Visit the Docker installation guide and select your distribution.
NVIDIA Drivers (GPU only)
If using GPUs, install NVIDIA drivers appropriate for your GPU model.
Verify installation:
NVIDIA Container Toolkit (GPU only)
Install NVIDIA Container Toolkit to enable GPU access in containers.
Follow the NVIDIA documentation for your distribution.
Verify it works:
Step 2: Download and Install Peon
Download the Peon agent using the URL provided by Valohai. We recommend installing the agent inside a virtual environment.
Replace <URL> with the download URL from Valohai.
⚠️ The
distutilsmodule has been deprecated on Ubuntu 24.04 and Python 3.12. As some of the Valohai components depend on this, you will need to install for examplesetuptoolsinside the virtual environment to ensure everything works as expected.
Step 3: Configure Peon
Create the configuration file /etc/peon.config:
Configuration values:
Replace these placeholders:
<queue-name>- Your queue name from Valohai<redis-password>- Redis password from Valohai (stored in your cloud Secret Manager)<queue-address>- Queue address from Valohai
Step 4: Create Systemd Service
Create the service file /etc/systemd/system/peon.service:
Important: Update these values:
ExecStart- Path to valohai-peon binary (usesudo find / -iname "valohai-peon" -printto find it)If you installed Peon inside a virtual environment, the binary should be there
Common locations for global installation:
/home/valohai/.local/bin/valohai-peonor/usr/local/bin/valohai-peon
User- Linux user that will run the serviceGroup- Linux group for the user
Step 5: Create Cleanup Service
Create /etc/systemd/system/peon-clean.service:
Update ExecStart, User, and Group as needed.
Step 6: Create Cleanup Timer
Create /etc/systemd/system/peon-clean.timer:
This runs the cleanup service every 10 minutes to remove stale caches and Docker images.
Step 7: Create Warden Service
Warden is a component that allows monitoring the Peon status in the Valohai UI.
Create /etc/systemd/system/peon-warden.service:
Step 8: Grant Docker Permissions
The user running Peon needs permissions to control Docker:
Replace <User> with the user from your service files (e.g., valohai).
Step 9: Start Services
Reload systemd to recognize the new service files:
Start the Peon service:
Check that services are running:
Step 10: Enable Auto-Start
Enable services to start automatically on boot:
Troubleshooting Service Start
For a global installation, if services fail to start, try using the full Python module path in ExecStart:
Use this in both peon.service and peon-clean.service files if needed.
Multi-GPU Configuration
If your server has multiple GPUs, you can configure Peon in such way so that it allows you, for each execution, to specify how many GPUs it should use.
If this value is less than the number of available GPUs it will allow you running multiple executions at the same time on the same machine and therefore having much better resource utilization.
This kind of Peon configuration is called Dispatch mode. For example, considering a single machine with 4 GPUs, you could achieve next utilization:
4 executions, each using 1 GPU
1 execution using 1 GPU, 1 execution using 3 GPUs
1 execution using 4 GPUs
Configure Dispatch mode
Prerequisites:
Valohai worker already installed (Ubuntu installer or manual installation)
Multiple GPUs available on the server
Steps:
Stop the running Peon service
💡 Take a look at inhibition mode, that will allow you to safely stop the running Peon service without the loss of any information or data.
Add
DISPATCH_MODE=trueto thepeon.config, created above, making it look like this
You can start the Peon service again
❗When contacting the Valohai support (per Prerequisites) don't forget to mention that the queue you wish to create will be used in Dispatch mode
Create an execution that uses Dispatch mode
When creating an execution, using VH_GPUS environment variable, you can specify how many GPUs does that execution requires.
💡Take a look at how you can add an environment variable to your execution.
VH_GPUS takes an integer value greater than 0 and if not specified, it's value will be 1 - execution will occupy 1 GPU.
Be careful to not assign a value that's greater than the number of available GPUs on any of the machines, in that case, execution will stay in the queue indefinitely.
Troubleshooting
Worker Not Connecting
Check Peon service status:
View Peon logs:
Common issues:
Incorrect Redis password
Queue address unreachable
Network firewall blocking port 63790
Missing environment variables in configuration
Docker Permission Errors
If you see "permission denied" errors when running Docker:
NVIDIA GPU Issues
Verify NVIDIA drivers:
Test GPU access in Docker:
Check NVIDIA Container Toolkit installation:
If GPU access doesn't work, verify that the NVIDIA Container Toolkit is properly installed and configured (see Manual Installation Step 1).
Jobs Not Starting
Check logs:
Look for errors related to:
Docker image pull failures
Network connectivity issues
Storage access problems
Verify Redis connection:
No Jobs Running or Service Stuck
Restart the Peon service:
Check for recent logs:
High Disk Usage
The Peon cleanup service should automatically remove old caches and Docker images.
Verify cleanup timer is running:
Manually trigger cleanup:
Check Docker disk usage:
Manual cleanup:
Collecting Logs for Support
If you need to contact Valohai support, collect logs:
Send peon-logs.txt to [email protected] with:
Description of the issue
Queue name
Server specifications (CPU, RAM, GPU)
When the issue started
Getting Help
Valohai Support: [email protected]
Include in support requests:
Operating system and version
Python version
Docker version
GPU model (if applicable)
Peon logs (see "Collecting Logs for Support" above)
Description of the issue and when it started
Last updated
Was this helpful?
