On-Premises NFS

Mount on-premises network file systems to access existing data infrastructure directly from Valohai executions.


When to Use On-Premises NFS

On-premises NFS mounting serves a different purpose than cloud network storage:

Data Already Exists on Network Shares

Use when:

  • Large datasets already on corporate NFS servers

  • Legacy systems produce data on network shares

  • Multiple departments share data on existing file servers

  • Migrating terabytes of data is impractical

Example workflow:

  1. Medical imagining on hospital NFS

  2. Mount the volume to your execution

  3. Process the data while meeting compliance requirements

  4. Save results to outputs to start tracking them as datums

  5. Everything is versioned and tracked for audit


Data Compliance Requirements

Use when:

  • Healthcare data must stay in hospital network (HIPAA)

  • Financial data has regulatory restrictions (PCI DSS, GDPR)

  • Government data cannot leave controlled environment

  • Corporate policy prohibits cloud data storage


Hybrid Cloud Strategy

Use when:

  • Transitioning gradually to cloud

  • Need access to both on-prem and cloud data

  • Want to keep sensitive data on-prem while using cloud compute

  • Cost optimization (avoid cloud storage costs for large static datasets)


Critical Trade-Off: Speed vs. Versioning

⚠️ Important: Valohai does NOT version or track files on mounted network storage.

What this means:

  • Files read from mounts: Not versioned

  • Files written to mounts: Not versioned

  • Files saved to /valohai/outputs/: Versioned ✅

Decision Tree: Should I Use NFS Mounts?


On-Prem NFS vs. Valohai Inputs

Feature
On-Prem NFS Mount
Valohai Inputs

Versioning

❌ No tracking

✅ Full versioning

Reproducibility

❌ Data can change

✅ Immutable references

Data location

✅ Stays on-premises

❌ Must be in cloud storage

Setup complexity

⚠️ Network + VPN config

✅ Simple

Speed

⚠️ Depends on network

✅ Fast (cloud-native)

Best for

Existing on-prem data, compliance

All other cases

Compliance

✅ Data never leaves premises

❌ Data moves to cloud


Always save processed results to /valohai/outputs/ for versioning:

Why this matters:


Prerequisites

Before mounting on-premises NFS in Valohai:

  1. Network connectivity — Valohai execution environments must reach your on-prem NFS server

  2. VPN or Direct Connect — Secure connection between cloud and on-premises network

  3. NFS server accessible — NFS service running and accessible from Valohai worker IPs

  4. Firewall rules — Allow NFS traffic from Valohai workers

  5. Mount permissions — NFS export configured to allow access from Valohai workers


Mount On-Premises NFS in Execution

Basic Mount Configuration

valohai.yaml:

For networked NFS server:

Parameters:

  • destination — Mount point inside container (e.g., /mnt/company-data)

  • source — NFS path (format: <server>:<export-path> or local mount path)

  • typenfs when specifying remote server

  • readonlytrue (recommended) or false


Mount Specific NFS Directory

Mounts only a specific subdirectory from your NFS server.


Complete Workflow Example

Mount → Process → Save Pattern

Scenario: Process medical imaging from hospital NFS, extract features, save to Valohai outputs for compliance tracking.

valohai.yaml:

process_scans.py:

Result:

  • ✅ Medical scans accessed from on-prem NFS (data never leaves hospital network)

  • ✅ De-identified metadata and features saved to /valohai/outputs/ (versioned, compliant)

  • ✅ Dataset created for reproducible analysis

  • ✅ Audit trail maintained with source tracking


Readonly vs. Writeable Mounts

Use when:

  • Accessing shared reference data

  • Reading large datasets for processing

  • Multiple executions need same data

  • Want to prevent accidental modifications

Benefits:

  • ✅ Prevents accidental data corruption

  • ✅ Safe for parallel executions

  • ✅ Clear intent (read-only access)


Writeable Mounts (Use Carefully)

Use when:

  • Need shared scratch space for intermediate results

  • Writing temporary files shared across parallel workers

  • Caching expensive computations

Risks:

  • ⚠️ Files written here are NOT versioned

  • ⚠️ Parallel executions can conflict

  • ⚠️ No automatic cleanup

Best practice: Use writeable mounts for temporary data only. Always save final results to /valohai/outputs/.


Best Practices

Use Readonly for Sensitive Data


Always Version Processed Results


Maintaining Reproducibility

⚠️ Critical: On-premises data can change. Always save processed results to /valohai/outputs/ for versioning and audit trails.

The problem:

The solution:



Next Steps

  • Set up VPN or Direct Connect between cloud and on-premises

  • Configure NFS exports and firewall rules

  • Test connectivity with small execution

  • Build pipeline: mount → process → save to outputs

  • Document compliance and data handling procedures

  • Monitor network performance and optimize access patterns

Last updated

Was this helpful?