Workers Behind NAT
Set up SSH debugging for Valohai workers without public IP addresses using a jump host
Some organizations require that worker instances run without public IP addresses for security. This guide shows how to set up SSH debugging in these environments using a jump host with reverse proxy.
Who needs this guide?
Platform administrators configuring Valohai in networks where:
Worker instances have only private IP addresses
Outbound internet access goes through a NAT gateway
Security policies prohibit public IPs on worker machines
Standard SSH setup: If your workers have public IPs, use the Configure SSH Access guide instead.
Architecture overview
This setup uses frp (Fast Reverse Proxy) to create tunnels from private workers to a public jump host:
User's laptop → Jump host (public IP) → Worker (private IP)
↓
frps server listening
↑
Workers connect via frpc clientHow it works:
Jump host runs
frpsserver with a public IPWorkers run
frpcclient and connect to jump hostUsers SSH to jump host, which proxies to workers
Each worker gets a unique port on the jump host
Prerequisites
Before starting, you need:
A cloud provider account (AWS, Azure, or GCP)
Permission to create virtual machines and firewall rules
The VPC or virtual network where Valohai workers run
The security group or firewall tag used by workers
Step 1: Create the jump host
Launch a small VM in your Valohai VPC to act as the jump host:
AWS
Instance type:
t3.microort3.smallAMI: Ubuntu 22.04 LTS
Network: Same VPC as Valohai workers
Public IP: Enabled
Security group: Create new (configured in next step)
GCP
Machine type:
e2-microore2-smallImage: Ubuntu 22.04 LTS
Network: Same VPC as Valohai workers
External IP: Ephemeral or static
Firewall tags:
valohai-jump-host
Azure
VM size:
Standard_B1sorStandard_B1msImage: Ubuntu 22.04 LTS
Virtual network: Same as Valohai workers
Public IP: Create new
Network security group: Create new
Step 2: Configure firewall rules
Jump host inbound rules
Configure the jump host's firewall to allow:
Rule 1: frps server port (from workers)
Source: Security group of Valohai workers (e.g.,
valohai-sg-workers)Protocol: TCP
Port:
7000(or your chosen frps port)Purpose: Workers connect to frps server
Rule 2: SSH proxy ports (from users)
Source:
0.0.0.0/0or specific IP ranges (your office/VPN)Protocol: TCP
Port range:
10000-50000(or your chosen range)Purpose: Users connect to workers through jump host
Rule 3: Administrative SSH (temporary)
Source: Your IP address
Protocol: TCP
Port:
22Purpose: Initial setup only (can remove after setup)
AWS example
Edit the jump host's security group:
# Allow frps from workers
aws ec2 authorize-security-group-ingress \
--group-id sg-jumphost123 \
--protocol tcp \
--port 7000 \
--source-group sg-workers456
# Allow SSH proxy from users
aws ec2 authorize-security-group-ingress \
--group-id sg-jumphost123 \
--protocol tcp \
--port 10000-50000 \
--cidr 0.0.0.0/0GCP example
Create firewall rules:
# Allow frps from workers
gcloud compute firewall-rules create valohai-jump-frps \
--network valohai-vpc \
--allow tcp:7000 \
--source-tags valohai-worker \
--target-tags valohai-jump-host
# Allow SSH proxy from users
gcloud compute firewall-rules create valohai-jump-proxy \
--network valohai-vpc \
--allow tcp:10000-50000 \
--source-ranges 0.0.0.0/0 \
--target-tags valohai-jump-hostStep 3: Install frps on jump host
SSH into the jump host and install the frps server:
# Create installation directory
sudo mkdir -p /opt/bin
cd /opt/bin
# Download frps
sudo wget https://dist.valohai.com/frp/frp_0.61.0_linux_amd64/frps.gz
sudo gunzip frps.gz
sudo chmod a+x frpsVerify the installation:
/opt/bin/frps --versionStep 4: Create frps service
Set up frps to run as a systemd service:
sudo systemctl edit --force --full frps.serviceAdd this configuration:
[Unit]
Description=Fast Reverse Proxy Server
After=network.target
[Service]
Type=simple
ExecStart=/opt/bin/frps --log_level=info
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.targetSave and exit the editor.
Step 5: Start and enable frps
Start the frps service and enable it to run on boot:
# Reload systemd configuration
sudo systemctl daemon-reload
# Start frps now and on boot
sudo systemctl enable --now frps
# Verify it's running
sudo systemctl status frpsYou should see:
● frps.service - Fast Reverse Proxy Server
Loaded: loaded (/etc/systemd/system/frps.service; enabled)
Active: active (running) since ...Step 6: Configure Valohai workers
Provide the following information to your Valohai contact, or configure it yourself if managing workers manually:
Required information:
Jump host public IP address
Jump host private IP address (for worker connections)
frps port (e.g.,
7000)Port range for user connections (e.g.,
10000-50000)
For Valohai-managed workers
Send these details to your Valohai contact. They'll update your worker configuration automatically.
For self-managed workers
Edit your worker prep template and add this to extra-config-json:
{
"PEON_PORT_FORWARDING_CONFIG": "type=frp,server=<JUMP-HOST-PRIVATE-IP>:7000,server_public=<JUMP-HOST-PUBLIC-IP>,port_range=10000-50000"
}Example:
{
"PEON_PORT_FORWARDING_CONFIG": "type=frp,server=10.0.1.50:7000,server_public=54.123.45.67,port_range=10000-50000"
}Rerun the worker setup script to apply changes.
For static worker machines
If you have manually installed Valohai workers on static machines, edit /etc/peon.config:
PORT_FORWARDING_CONFIG=type=frp,server=<JUMP-HOST-PRIVATE-IP>:7000,server_public=<JUMP-HOST-PUBLIC-IP>,port_range=10000-50000Restart the worker:
sudo systemctl restart peonStep 7: Verify the setup
Test that the jump host configuration works:
Start a Valohai execution with SSH enabled
Check execution logs for SSH connection details:
SSH debugging enabled on 54.123.45.67:12345Note the port number (e.g.,
12345)—this is a port in your configured rangeTest SSH connection:
ssh -i ~/.ssh/debug-key <JUMP-HOST-PUBLIC-IP> -p 12345 -t /bin/bash
If you can connect and see the execution environment, the setup is complete.
How it works
When a worker starts an execution with SSH enabled:
Worker downloads and runs
frpcclientfrpcconnects tofrpsserver on jump host (port 7000)frpsallocates a port from the range (e.g., 12345)User connects to jump host on allocated port
frpsproxies connection to worker through existing tunnel
Port allocation: Each execution gets a unique port from the range. If 100 executions run simultaneously, ports 10000-10099 would be used.
Monitoring and maintenance
Check frps status
View frps logs:
sudo journalctl -u frps -fLook for worker connections:
[I] [proxy.go] new proxy [vh-exec-123] successMonitor port usage
Check active connections:
sudo netstat -tlnp | grep frpsThis shows which ports are currently proxying to workers.
Restart frps
If frps stops responding:
sudo systemctl restart frpsWorkers will automatically reconnect when frps comes back online.
Troubleshooting
Workers can't connect to frps
Symptom: Execution logs show "Connection refused" or timeout errors.
Check:
Verify jump host security group allows port 7000 from workers
Confirm jump host private IP is correct in worker configuration
Check frps is running:
sudo systemctl status frpsReview frps logs:
sudo journalctl -u frps -n 100
Fix: Restart frps or update security group rules.
Users can't connect through jump host
Symptom: SSH connection times out or is refused.
Check:
Verify jump host security group allows port range (10000-50000)
Confirm user is connecting to the public IP, not private IP
Check the port number in execution logs matches SSH command
Verify execution is still running (not completed or failed)
Fix: Update firewall rules or confirm execution status.
Port range exhausted
Symptom: New executions can't enable SSH after many parallel executions.
Check: Count active proxies:
sudo netstat -tlnp | grep frps | wc -lFix:
Increase port range (e.g., 10000-60000)
Stop old executions that no longer need SSH
Update worker configuration with new range
frps high CPU usage
Symptom: Jump host CPU usage near 100%.
Cause: Many simultaneous SSH connections with high traffic.
Fix:
Upgrade jump host to larger instance type
Use multiple jump hosts with different port ranges
Reduce number of parallel SSH sessions
Security considerations
Minimize port range exposure: Only open ports needed for your typical parallel execution count (e.g., if you run max 50 parallel jobs, use 10000-10050).
Use IP allowlists: Restrict the port range source to office networks or VPN instead of 0.0.0.0/0.
Monitor unusual activity: Set up alerts for spike in connections or port scanning attempts.
Regularly rotate jump host: Rebuild jump host every few months as part of security maintenance.
Limit jump host access: Remove administrative SSH access (port 22) after initial setup, or restrict to specific IPs.
Alternative: Bastion host pattern
If you already have a bastion host in your VPC, you can use it instead of a dedicated jump host:
Install frps on existing bastion host
Configure same firewall rules
Update worker configuration with bastion's IP addresses
The frps setup is the same—just use your existing bastion infrastructure.
Cost considerations
Jump host costs:
AWS t3.micro: ~$7/month
GCP e2-micro: ~$6/month
Azure B1s: ~$8/month
Network transfer:
Minimal for SSH sessions (mostly text)
IDE debugging with file sync may increase costs
Monitor CloudWatch/Stackdriver for unexpected spikes
Optimization: Use smallest instance type. frps uses minimal resources unless handling many simultaneous connections.
Next steps
After completing this setup:
Test with your team: Have a few users verify SSH debugging works
Document specifics: Note jump host IP and port range in your internal docs
Set up monitoring: Configure alerts for frps downtime or connection issues
Train users: Share the SSH Overview guide with your team
Users can now debug executions by following:
SSH Overview - General SSH debugging guide
VS Code Remote Debugging - VS Code IDE setup
PyCharm Remote Debugging - PyCharm IDE setup
Last updated
Was this helpful?
