Workers Behind NAT

Set up SSH debugging for Valohai workers without public IP addresses using a jump host

Some organizations require that worker instances run without public IP addresses for security. This guide shows how to set up SSH debugging in these environments using a jump host with reverse proxy.

Who needs this guide?

Platform administrators configuring Valohai in networks where:

  • Worker instances have only private IP addresses

  • Outbound internet access goes through a NAT gateway

  • Security policies prohibit public IPs on worker machines

Standard SSH setup: If your workers have public IPs, use the Configure SSH Access guide instead.

Architecture overview

This setup uses frp (Fast Reverse Proxy) to create tunnels from private workers to a public jump host:

User's laptop → Jump host (public IP) → Worker (private IP)

              frps server listening

              Workers connect via frpc client

How it works:

  1. Jump host runs frps server with a public IP

  2. Workers run frpc client and connect to jump host

  3. Users SSH to jump host, which proxies to workers

  4. Each worker gets a unique port on the jump host

Prerequisites

Before starting, you need:

  • A cloud provider account (AWS, Azure, or GCP)

  • Permission to create virtual machines and firewall rules

  • The VPC or virtual network where Valohai workers run

  • The security group or firewall tag used by workers

Step 1: Create the jump host

Launch a small VM in your Valohai VPC to act as the jump host:

AWS

  • Instance type: t3.micro or t3.small

  • AMI: Ubuntu 22.04 LTS

  • Network: Same VPC as Valohai workers

  • Public IP: Enabled

  • Security group: Create new (configured in next step)

GCP

  • Machine type: e2-micro or e2-small

  • Image: Ubuntu 22.04 LTS

  • Network: Same VPC as Valohai workers

  • External IP: Ephemeral or static

  • Firewall tags: valohai-jump-host

Azure

  • VM size: Standard_B1s or Standard_B1ms

  • Image: Ubuntu 22.04 LTS

  • Virtual network: Same as Valohai workers

  • Public IP: Create new

  • Network security group: Create new

Step 2: Configure firewall rules

Jump host inbound rules

Configure the jump host's firewall to allow:

Rule 1: frps server port (from workers)

  • Source: Security group of Valohai workers (e.g., valohai-sg-workers)

  • Protocol: TCP

  • Port: 7000 (or your chosen frps port)

  • Purpose: Workers connect to frps server

Rule 2: SSH proxy ports (from users)

  • Source: 0.0.0.0/0 or specific IP ranges (your office/VPN)

  • Protocol: TCP

  • Port range: 10000-50000 (or your chosen range)

  • Purpose: Users connect to workers through jump host

Rule 3: Administrative SSH (temporary)

  • Source: Your IP address

  • Protocol: TCP

  • Port: 22

  • Purpose: Initial setup only (can remove after setup)

AWS example

Edit the jump host's security group:

# Allow frps from workers
aws ec2 authorize-security-group-ingress \
  --group-id sg-jumphost123 \
  --protocol tcp \
  --port 7000 \
  --source-group sg-workers456

# Allow SSH proxy from users
aws ec2 authorize-security-group-ingress \
  --group-id sg-jumphost123 \
  --protocol tcp \
  --port 10000-50000 \
  --cidr 0.0.0.0/0

GCP example

Create firewall rules:

# Allow frps from workers
gcloud compute firewall-rules create valohai-jump-frps \
  --network valohai-vpc \
  --allow tcp:7000 \
  --source-tags valohai-worker \
  --target-tags valohai-jump-host

# Allow SSH proxy from users
gcloud compute firewall-rules create valohai-jump-proxy \
  --network valohai-vpc \
  --allow tcp:10000-50000 \
  --source-ranges 0.0.0.0/0 \
  --target-tags valohai-jump-host

Step 3: Install frps on jump host

SSH into the jump host and install the frps server:

# Create installation directory
sudo mkdir -p /opt/bin
cd /opt/bin

# Download frps
sudo wget https://dist.valohai.com/frp/frp_0.61.0_linux_amd64/frps.gz
sudo gunzip frps.gz
sudo chmod a+x frps

Verify the installation:

/opt/bin/frps --version

Step 4: Create frps service

Set up frps to run as a systemd service:

sudo systemctl edit --force --full frps.service

Add this configuration:

[Unit]
Description=Fast Reverse Proxy Server
After=network.target

[Service]
Type=simple
ExecStart=/opt/bin/frps --log_level=info
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

Save and exit the editor.

Step 5: Start and enable frps

Start the frps service and enable it to run on boot:

# Reload systemd configuration
sudo systemctl daemon-reload

# Start frps now and on boot
sudo systemctl enable --now frps

# Verify it's running
sudo systemctl status frps

You should see:

● frps.service - Fast Reverse Proxy Server
   Loaded: loaded (/etc/systemd/system/frps.service; enabled)
   Active: active (running) since ...

Step 6: Configure Valohai workers

Provide the following information to your Valohai contact, or configure it yourself if managing workers manually:

Required information:

  • Jump host public IP address

  • Jump host private IP address (for worker connections)

  • frps port (e.g., 7000)

  • Port range for user connections (e.g., 10000-50000)

For Valohai-managed workers

Send these details to your Valohai contact. They'll update your worker configuration automatically.

For self-managed workers

Edit your worker prep template and add this to extra-config-json:

{
  "PEON_PORT_FORWARDING_CONFIG": "type=frp,server=<JUMP-HOST-PRIVATE-IP>:7000,server_public=<JUMP-HOST-PUBLIC-IP>,port_range=10000-50000"
}

Example:

{
  "PEON_PORT_FORWARDING_CONFIG": "type=frp,server=10.0.1.50:7000,server_public=54.123.45.67,port_range=10000-50000"
}

Rerun the worker setup script to apply changes.

For static worker machines

If you have manually installed Valohai workers on static machines, edit /etc/peon.config:

PORT_FORWARDING_CONFIG=type=frp,server=<JUMP-HOST-PRIVATE-IP>:7000,server_public=<JUMP-HOST-PUBLIC-IP>,port_range=10000-50000

Restart the worker:

sudo systemctl restart peon

Step 7: Verify the setup

Test that the jump host configuration works:

  1. Start a Valohai execution with SSH enabled

  2. Check execution logs for SSH connection details:

    SSH debugging enabled on 54.123.45.67:12345
  3. Note the port number (e.g., 12345)—this is a port in your configured range

  4. Test SSH connection:

    ssh -i ~/.ssh/debug-key <JUMP-HOST-PUBLIC-IP> -p 12345 -t /bin/bash

If you can connect and see the execution environment, the setup is complete.

How it works

When a worker starts an execution with SSH enabled:

  1. Worker downloads and runs frpc client

  2. frpc connects to frps server on jump host (port 7000)

  3. frps allocates a port from the range (e.g., 12345)

  4. User connects to jump host on allocated port

  5. frps proxies connection to worker through existing tunnel

Port allocation: Each execution gets a unique port from the range. If 100 executions run simultaneously, ports 10000-10099 would be used.

Monitoring and maintenance

Check frps status

View frps logs:

sudo journalctl -u frps -f

Look for worker connections:

[I] [proxy.go] new proxy [vh-exec-123] success

Monitor port usage

Check active connections:

sudo netstat -tlnp | grep frps

This shows which ports are currently proxying to workers.

Restart frps

If frps stops responding:

sudo systemctl restart frps

Workers will automatically reconnect when frps comes back online.

Troubleshooting

Workers can't connect to frps

Symptom: Execution logs show "Connection refused" or timeout errors.

Check:

  1. Verify jump host security group allows port 7000 from workers

  2. Confirm jump host private IP is correct in worker configuration

  3. Check frps is running: sudo systemctl status frps

  4. Review frps logs: sudo journalctl -u frps -n 100

Fix: Restart frps or update security group rules.

Users can't connect through jump host

Symptom: SSH connection times out or is refused.

Check:

  1. Verify jump host security group allows port range (10000-50000)

  2. Confirm user is connecting to the public IP, not private IP

  3. Check the port number in execution logs matches SSH command

  4. Verify execution is still running (not completed or failed)

Fix: Update firewall rules or confirm execution status.

Port range exhausted

Symptom: New executions can't enable SSH after many parallel executions.

Check: Count active proxies:

sudo netstat -tlnp | grep frps | wc -l

Fix:

  • Increase port range (e.g., 10000-60000)

  • Stop old executions that no longer need SSH

  • Update worker configuration with new range

frps high CPU usage

Symptom: Jump host CPU usage near 100%.

Cause: Many simultaneous SSH connections with high traffic.

Fix:

  • Upgrade jump host to larger instance type

  • Use multiple jump hosts with different port ranges

  • Reduce number of parallel SSH sessions

Security considerations

Minimize port range exposure: Only open ports needed for your typical parallel execution count (e.g., if you run max 50 parallel jobs, use 10000-10050).

Use IP allowlists: Restrict the port range source to office networks or VPN instead of 0.0.0.0/0.

Monitor unusual activity: Set up alerts for spike in connections or port scanning attempts.

Regularly rotate jump host: Rebuild jump host every few months as part of security maintenance.

Limit jump host access: Remove administrative SSH access (port 22) after initial setup, or restrict to specific IPs.

Alternative: Bastion host pattern

If you already have a bastion host in your VPC, you can use it instead of a dedicated jump host:

  1. Install frps on existing bastion host

  2. Configure same firewall rules

  3. Update worker configuration with bastion's IP addresses

The frps setup is the same—just use your existing bastion infrastructure.

Cost considerations

Jump host costs:

  • AWS t3.micro: ~$7/month

  • GCP e2-micro: ~$6/month

  • Azure B1s: ~$8/month

Network transfer:

  • Minimal for SSH sessions (mostly text)

  • IDE debugging with file sync may increase costs

  • Monitor CloudWatch/Stackdriver for unexpected spikes

Optimization: Use smallest instance type. frps uses minimal resources unless handling many simultaneous connections.

Next steps

After completing this setup:

  1. Test with your team: Have a few users verify SSH debugging works

  2. Document specifics: Note jump host IP and port range in your internal docs

  3. Set up monitoring: Configure alerts for frps downtime or connection issues

  4. Train users: Share the SSH Overview guide with your team

Users can now debug executions by following:

Last updated

Was this helpful?