Workers Behind NAT

Set up SSH debugging for Valohai workers without public IP addresses using a jump host

Some organizations require that worker instances run without public IP addresses for security. This guide shows how to set up SSH debugging in these environments using a jump host with reverse proxy.

Who needs this guide?

Platform administrators configuring Valohai in networks where:

Worker instances have only private IP addresses
Outbound internet access goes through a NAT gateway
Security policies prohibit public IPs on worker machines

Standard SSH setup: If your workers have public IPs, use the Configure SSH Access guide instead.

Architecture overview

This setup uses frp (Fast Reverse Proxy) to create tunnels from private workers to a public jump host:

User's laptop → Jump host (public IP) → Worker (private IP)
                    ↓
              frps server listening
                    ↑
              Workers connect via frpc client

How it works:

Jump host runs frps server with a public IP
Workers run frpc client and connect to jump host
Users SSH to jump host, which proxies to workers
Each worker gets a unique port on the jump host

Prerequisites

Before starting, you need:

A cloud provider account (AWS, Azure, or GCP)
Permission to create virtual machines and firewall rules
The VPC or virtual network where Valohai workers run
The security group or firewall tag used by workers

Step 1: Create the jump host

Launch a small VM in your Valohai VPC to act as the jump host:

AWS

Instance type: t3.micro or t3.small
AMI: Ubuntu 22.04 LTS
Network: Same VPC as Valohai workers
Public IP: Enabled
Security group: Create new (configured in next step)

GCP

Machine type: e2-micro or e2-small
Image: Ubuntu 22.04 LTS
Network: Same VPC as Valohai workers
External IP: Ephemeral or static
Firewall tags: valohai-jump-host

Azure

VM size: Standard_B1s or Standard_B1ms
Image: Ubuntu 22.04 LTS
Virtual network: Same as Valohai workers
Public IP: Create new
Network security group: Create new

Step 2: Configure firewall rules

Jump host inbound rules

Configure the jump host's firewall to allow:

Rule 1: frps server port (from workers)

Source: Security group of Valohai workers (e.g., valohai-sg-workers)
Protocol: TCP
Port: 7000 (or your chosen frps port)
Purpose: Workers connect to frps server

Rule 2: SSH proxy ports (from users)

Source: 0.0.0.0/0 or specific IP ranges (your office/VPN)
Protocol: TCP
Port range: 10000-50000 (or your chosen range)
Purpose: Users connect to workers through jump host

Rule 3: Administrative SSH (temporary)

Source: Your IP address
Protocol: TCP
Port: 22
Purpose: Initial setup only (can remove after setup)

AWS example

Edit the jump host's security group:

# Allow frps from workers
aws ec2 authorize-security-group-ingress \
  --group-id sg-jumphost123 \
  --protocol tcp \
  --port 7000 \
  --source-group sg-workers456

# Allow SSH proxy from users
aws ec2 authorize-security-group-ingress \
  --group-id sg-jumphost123 \
  --protocol tcp \
  --port 10000-50000 \
  --cidr 0.0.0.0/0

GCP example

Create firewall rules:

# Allow frps from workers
gcloud compute firewall-rules create valohai-jump-frps \
  --network valohai-vpc \
  --allow tcp:7000 \
  --source-tags valohai-worker \
  --target-tags valohai-jump-host

# Allow SSH proxy from users
gcloud compute firewall-rules create valohai-jump-proxy \
  --network valohai-vpc \
  --allow tcp:10000-50000 \
  --source-ranges 0.0.0.0/0 \
  --target-tags valohai-jump-host

Step 3: Install frps on jump host

SSH into the jump host and install the frps server:

# Create installation directory
sudo mkdir -p /opt/bin
cd /opt/bin

# Download frps
sudo wget https://dist.valohai.com/frp/frp_0.61.0_linux_amd64/frps.gz
sudo gunzip frps.gz
sudo chmod a+x frps

Verify the installation:

/opt/bin/frps --version

Step 4: Create frps service

Set up frps to run as a systemd service:

sudo systemctl edit --force --full frps.service

Add this configuration:

[Unit]
Description=Fast Reverse Proxy Server
After=network.target

[Service]
Type=simple
ExecStart=/opt/bin/frps --log_level=info
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

Save and exit the editor.

Step 5: Start and enable frps

Start the frps service and enable it to run on boot:

# Reload systemd configuration
sudo systemctl daemon-reload

# Start frps now and on boot
sudo systemctl enable --now frps

# Verify it's running
sudo systemctl status frps

You should see:

● frps.service - Fast Reverse Proxy Server
   Loaded: loaded (/etc/systemd/system/frps.service; enabled)
   Active: active (running) since ...

Step 6: Configure Valohai workers

Provide the following information to your Valohai contact, or configure it yourself if managing workers manually:

Required information:

Jump host public IP address
Jump host private IP address (for worker connections)
frps port (e.g., 7000)
Port range for user connections (e.g., 10000-50000)

For Valohai-managed workers

Send these details to your Valohai contact. They'll update your worker configuration automatically.

For self-managed workers

Edit your worker prep template and add this to extra-config-json:

{
  "PEON_PORT_FORWARDING_CONFIG": "type=frp,server=<JUMP-HOST-PRIVATE-IP>:7000,server_public=<JUMP-HOST-PUBLIC-IP>,port_range=10000-50000"
}

Example:

{
  "PEON_PORT_FORWARDING_CONFIG": "type=frp,server=10.0.1.50:7000,server_public=54.123.45.67,port_range=10000-50000"
}

Rerun the worker setup script to apply changes.

For static worker machines

If you have manually installed Valohai workers on static machines, edit /etc/peon.config:

PORT_FORWARDING_CONFIG=type=frp,server=<JUMP-HOST-PRIVATE-IP>:7000,server_public=<JUMP-HOST-PUBLIC-IP>,port_range=10000-50000

Restart the worker:

sudo systemctl restart peon

Step 7: Verify the setup

Test that the jump host configuration works:

Start a Valohai execution with SSH enabled
Check execution logs for SSH connection details:
```
SSH debugging enabled on 54.123.45.67:12345
```
Note the port number (e.g., 12345)—this is a port in your configured range

Test SSH connection:

ssh -i ~/.ssh/debug-key <JUMP-HOST-PUBLIC-IP> -p 12345 -t /bin/bash

If you can connect and see the execution environment, the setup is complete.

How it works

When a worker starts an execution with SSH enabled:

Worker downloads and runs frpc client
frpc connects to frps server on jump host (port 7000)
frps allocates a port from the range (e.g., 12345)
User connects to jump host on allocated port
frps proxies connection to worker through existing tunnel

Port allocation: Each execution gets a unique port from the range. If 100 executions run simultaneously, ports 10000-10099 would be used.

Monitoring and maintenance

Check frps status

View frps logs:

sudo journalctl -u frps -f

Look for worker connections:

[I] [proxy.go] new proxy [vh-exec-123] success

Monitor port usage

Check active connections:

sudo netstat -tlnp | grep frps

This shows which ports are currently proxying to workers.

Restart frps

If frps stops responding:

sudo systemctl restart frps

Workers will automatically reconnect when frps comes back online.

Troubleshooting

Workers can't connect to frps

Symptom: Execution logs show "Connection refused" or timeout errors.

Check:

Verify jump host security group allows port 7000 from workers
Confirm jump host private IP is correct in worker configuration
Check frps is running: sudo systemctl status frps
Review frps logs: sudo journalctl -u frps -n 100

Fix: Restart frps or update security group rules.

Users can't connect through jump host

Symptom: SSH connection times out or is refused.

Check:

Verify jump host security group allows port range (10000-50000)
Confirm user is connecting to the public IP, not private IP
Check the port number in execution logs matches SSH command
Verify execution is still running (not completed or failed)

Fix: Update firewall rules or confirm execution status.

Port range exhausted

Symptom: New executions can't enable SSH after many parallel executions.

Check: Count active proxies:

sudo netstat -tlnp | grep frps | wc -l

Fix:

Increase port range (e.g., 10000-60000)
Stop old executions that no longer need SSH
Update worker configuration with new range

frps high CPU usage

Symptom: Jump host CPU usage near 100%.

Cause: Many simultaneous SSH connections with high traffic.

Fix:

Upgrade jump host to larger instance type
Use multiple jump hosts with different port ranges
Reduce number of parallel SSH sessions

Security considerations

Minimize port range exposure: Only open ports needed for your typical parallel execution count (e.g., if you run max 50 parallel jobs, use 10000-10050).

Use IP allowlists: Restrict the port range source to office networks or VPN instead of 0.0.0.0/0.

Monitor unusual activity: Set up alerts for spike in connections or port scanning attempts.

Regularly rotate jump host: Rebuild jump host every few months as part of security maintenance.

Limit jump host access: Remove administrative SSH access (port 22) after initial setup, or restrict to specific IPs.

Alternative: Bastion host pattern

If you already have a bastion host in your VPC, you can use it instead of a dedicated jump host:

Install frps on existing bastion host
Configure same firewall rules
Update worker configuration with bastion's IP addresses

The frps setup is the same—just use your existing bastion infrastructure.

Cost considerations

Jump host costs:

AWS t3.micro: ~$7/month
GCP e2-micro: ~$6/month
Azure B1s: ~$8/month

Network transfer:

Minimal for SSH sessions (mostly text)
IDE debugging with file sync may increase costs
Monitor CloudWatch/Stackdriver for unexpected spikes

Optimization: Use smallest instance type. frps uses minimal resources unless handling many simultaneous connections.

Next steps

After completing this setup:

Test with your team: Have a few users verify SSH debugging works
Document specifics: Note jump host IP and port range in your internal docs
Set up monitoring: Configure alerts for frps downtime or connection issues
Train users: Share the SSH Overview guide with your team

Users can now debug executions by following:

SSH Overview - General SSH debugging guide
VS Code Remote Debugging - VS Code IDE setup
PyCharm Remote Debugging - PyCharm IDE setup

PreviousConfigure SSH Access NextChangelog

Last updated 5 hours ago

Was this helpful?