# Workers Behind NAT

Some organizations require that worker instances run without public IP addresses for security. This guide shows how to set up SSH debugging in these environments using a jump host with reverse proxy.

## Who needs this guide?

**Platform administrators** configuring Valohai in networks where:

* Worker instances have only private IP addresses
* Outbound internet access goes through a NAT gateway
* Security policies prohibit public IPs on worker machines

**Standard SSH setup**: If your workers have public IPs, use the [Configure SSH Access](/installation-and-setup/advanced-topics/configure-ssh-access.md) guide instead.

## Architecture overview

This setup uses **frp** (Fast Reverse Proxy) to create tunnels from private workers to a public jump host:

```
User's laptop → Jump host (public IP) → Worker (private IP)
                    ↓
              frps server listening
                    ↑
              Workers connect via frpc client
```

**How it works:**

1. Jump host runs `frps` server with a public IP
2. Workers run `frpc` client and connect to jump host
3. Users SSH to jump host, which proxies to workers
4. Each worker gets a unique port on the jump host

## Prerequisites

Before starting, you need:

* A cloud provider account (AWS, Azure, or GCP)
* Permission to create virtual machines and firewall rules
* The VPC or virtual network where Valohai workers run
* The security group or firewall tag used by workers

## Step 1: Create the jump host

Launch a small VM in your Valohai VPC to act as the jump host:

### AWS

* **Instance type:** `t3.micro` or `t3.small`
* **AMI:** Ubuntu 22.04 LTS
* **Network:** Same VPC as Valohai workers
* **Public IP:** Enabled
* **Security group:** Create new (configured in next step)

### GCP

* **Machine type:** `e2-micro` or `e2-small`
* **Image:** Ubuntu 22.04 LTS
* **Network:** Same VPC as Valohai workers
* **External IP:** Ephemeral or static
* **Firewall tags:** `valohai-jump-host`

### Azure

* **VM size:** `Standard_B1s` or `Standard_B1ms`
* **Image:** Ubuntu 22.04 LTS
* **Virtual network:** Same as Valohai workers
* **Public IP:** Create new
* **Network security group:** Create new

## Step 2: Configure firewall rules

### Jump host inbound rules

Configure the jump host's firewall to allow:

**Rule 1: frps server port (from workers)**

* **Source:** Security group of Valohai workers (e.g., `valohai-sg-workers`)
* **Protocol:** TCP
* **Port:** `7000` (or your chosen frps port)
* **Purpose:** Workers connect to frps server

**Rule 2: SSH proxy ports (from users)**

* **Source:** `0.0.0.0/0` or specific IP ranges (your office/VPN)
* **Protocol:** TCP
* **Port range:** `10000-50000` (or your chosen range)
* **Purpose:** Users connect to workers through jump host

**Rule 3: Administrative SSH (temporary)**

* **Source:** Your IP address
* **Protocol:** TCP
* **Port:** `22`
* **Purpose:** Initial setup only (can remove after setup)

### AWS example

Edit the jump host's security group:

```shell
# Allow frps from workers
aws ec2 authorize-security-group-ingress \
  --group-id sg-jumphost123 \
  --protocol tcp \
  --port 7000 \
  --source-group sg-workers456

# Allow SSH proxy from users
aws ec2 authorize-security-group-ingress \
  --group-id sg-jumphost123 \
  --protocol tcp \
  --port 10000-50000 \
  --cidr 0.0.0.0/0
```

### GCP example

Create firewall rules:

```shell
# Allow frps from workers
gcloud compute firewall-rules create valohai-jump-frps \
  --network valohai-vpc \
  --allow tcp:7000 \
  --source-tags valohai-worker \
  --target-tags valohai-jump-host

# Allow SSH proxy from users
gcloud compute firewall-rules create valohai-jump-proxy \
  --network valohai-vpc \
  --allow tcp:10000-50000 \
  --source-ranges 0.0.0.0/0 \
  --target-tags valohai-jump-host
```

## Step 3: Install frps on jump host

SSH into the jump host and install the frps server:

```shell
# Create installation directory
sudo mkdir -p /opt/bin
cd /opt/bin

# Download frps
sudo wget https://dist.valohai.com/frp/frp_0.61.0_linux_amd64/frps.gz
sudo gunzip frps.gz
sudo chmod a+x frps
```

Verify the installation:

```shell
/opt/bin/frps --version
```

## Step 4: Create frps service

Set up frps to run as a systemd service:

```shell
sudo systemctl edit --force --full frps.service
```

Add this configuration:

```ini
[Unit]
Description=Fast Reverse Proxy Server
After=network.target

[Service]
Type=simple
ExecStart=/opt/bin/frps --log_level=info
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target
```

Save and exit the editor.

## Step 5: Start and enable frps

Start the frps service and enable it to run on boot:

```shell
# Reload systemd configuration
sudo systemctl daemon-reload

# Start frps now and on boot
sudo systemctl enable --now frps

# Verify it's running
sudo systemctl status frps
```

You should see:

```
● frps.service - Fast Reverse Proxy Server
   Loaded: loaded (/etc/systemd/system/frps.service; enabled)
   Active: active (running) since ...
```

## Step 6: Configure Valohai workers

Provide the following information to your Valohai contact, or configure it yourself if managing workers manually:

**Required information:**

* Jump host **public IP address**
* Jump host **private IP address** (for worker connections)
* **frps port** (e.g., `7000`)
* **Port range** for user connections (e.g., `10000-50000`)

### For Valohai-managed workers

Send these details to your Valohai contact. They'll update your worker configuration automatically.

### For self-managed workers

Edit your worker prep template and add this to `extra-config-json`:

```json
{
  "PEON_PORT_FORWARDING_CONFIG": "type=frp,server=<JUMP-HOST-PRIVATE-IP>:7000,server_public=<JUMP-HOST-PUBLIC-IP>,port_range=10000-50000"
}
```

**Example:**

```json
{
  "PEON_PORT_FORWARDING_CONFIG": "type=frp,server=10.0.1.50:7000,server_public=54.123.45.67,port_range=10000-50000"
}
```

Rerun the worker setup script to apply changes.

### For static worker machines

If you have manually installed Valohai workers on static machines, edit `/etc/peon.config`:

```shell
PORT_FORWARDING_CONFIG=type=frp,server=<JUMP-HOST-PRIVATE-IP>:7000,server_public=<JUMP-HOST-PUBLIC-IP>,port_range=10000-50000
```

Restart the worker:

```shell
sudo systemctl restart peon
```

## Step 7: Verify the setup

Test that the jump host configuration works:

1. Start a Valohai execution with SSH enabled
2. Check execution logs for SSH connection details:

   ```
   SSH debugging enabled on 54.123.45.67:12345
   ```
3. Note the port number (e.g., `12345`)—this is a port in your configured range
4. Test SSH connection:

   ```shell
   ssh -i ~/.ssh/debug-key <JUMP-HOST-PUBLIC-IP> -p 12345 -t /bin/bash
   ```

If you can connect and see the execution environment, the setup is complete.

## How it works

When a worker starts an execution with SSH enabled:

1. Worker downloads and runs `frpc` client
2. `frpc` connects to `frps` server on jump host (port 7000)
3. `frps` allocates a port from the range (e.g., 12345)
4. User connects to jump host on allocated port
5. `frps` proxies connection to worker through existing tunnel

**Port allocation:** Each execution gets a unique port from the range. If 100 executions run simultaneously, ports 10000-10099 would be used.

## Monitoring and maintenance

### Check frps status

View frps logs:

```shell
sudo journalctl -u frps -f
```

Look for worker connections:

```
[I] [proxy.go] new proxy [vh-exec-123] success
```

### Monitor port usage

Check active connections:

```shell
sudo netstat -tlnp | grep frps
```

This shows which ports are currently proxying to workers.

### Restart frps

If frps stops responding:

```shell
sudo systemctl restart frps
```

Workers will automatically reconnect when frps comes back online.

## Troubleshooting

### Workers can't connect to frps

**Symptom:** Execution logs show "Connection refused" or timeout errors.

**Check:**

1. Verify jump host security group allows port 7000 from workers
2. Confirm jump host private IP is correct in worker configuration
3. Check frps is running: `sudo systemctl status frps`
4. Review frps logs: `sudo journalctl -u frps -n 100`

**Fix:** Restart frps or update security group rules.

### Users can't connect through jump host

**Symptom:** SSH connection times out or is refused.

**Check:**

1. Verify jump host security group allows port range (10000-50000)
2. Confirm user is connecting to the public IP, not private IP
3. Check the port number in execution logs matches SSH command
4. Verify execution is still running (not completed or failed)

**Fix:** Update firewall rules or confirm execution status.

### Port range exhausted

**Symptom:** New executions can't enable SSH after many parallel executions.

**Check:** Count active proxies:

```shell
sudo netstat -tlnp | grep frps | wc -l
```

**Fix:**

* Increase port range (e.g., 10000-60000)
* Stop old executions that no longer need SSH
* Update worker configuration with new range

### frps high CPU usage

**Symptom:** Jump host CPU usage near 100%.

**Cause:** Many simultaneous SSH connections with high traffic.

**Fix:**

* Upgrade jump host to larger instance type
* Use multiple jump hosts with different port ranges
* Reduce number of parallel SSH sessions

## Security considerations

**Minimize port range exposure**: Only open ports needed for your typical parallel execution count (e.g., if you run max 50 parallel jobs, use 10000-10050).

**Use IP allowlists**: Restrict the port range source to office networks or VPN instead of `0.0.0.0/0`.

**Monitor unusual activity**: Set up alerts for spike in connections or port scanning attempts.

**Regularly rotate jump host**: Rebuild jump host every few months as part of security maintenance.

**Limit jump host access**: Remove administrative SSH access (port 22) after initial setup, or restrict to specific IPs.

## Alternative: Bastion host pattern

If you already have a bastion host in your VPC, you can use it instead of a dedicated jump host:

1. Install frps on existing bastion host
2. Configure same firewall rules
3. Update worker configuration with bastion's IP addresses

The frps setup is the same—just use your existing bastion infrastructure.

## Cost considerations

**Jump host costs:**

* **AWS t3.micro:** \~$7/month
* **GCP e2-micro:** \~$6/month
* **Azure B1s:** \~$8/month

**Network transfer:**

* Minimal for SSH sessions (mostly text)
* IDE debugging with file sync may increase costs
* Monitor CloudWatch/Stackdriver for unexpected spikes

**Optimization:** Use smallest instance type. frps uses minimal resources unless handling many simultaneous connections.

## Next steps

After completing this setup:

1. **Test with your team**: Have a few users verify SSH debugging works
2. **Document specifics**: Note jump host IP and port range in your internal docs
3. **Set up monitoring**: Configure alerts for frps downtime or connection issues
4. **Train users**: Share the [SSH Overview](/development-and-debugging/ssh-overview.md) guide with your team

Users can now debug executions by following:

* [SSH Overview](/development-and-debugging/ssh-overview.md) - General SSH debugging guide
* [VS Code Remote Debugging](/development-and-debugging/vs-code-remote-debugging.md) - VS Code IDE setup
* [PyCharm Remote Debugging](/development-and-debugging/pycharm-remote-debugging.md) - PyCharm IDE setup


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/installation-and-setup/advanced-topics/workers-behind-nat.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
