AWS Elastic File System
Mount AWS Elastic File System (EFS) to access shared network storage directly from Valohai executions.
Overview
AWS EFS provides managed NFS storage that you can mount in Valohai executions queued for AWS environments (ec2 instance).
Use EFS to:
Access large datasets without downloading
Share preprocessed data across multiple executions
Cache intermediate results on fast shared storage
Process data in place and save versioned outputs
⚠️ Important: Files on EFS mounts are NOT versioned by Valohai. Always save final results to
/valohai/outputs/for reproducibility.
Prerequisites
Before mounting EFS in Valohai:
Existing EFS file system — Use an existing EFS or create a new one in AWS Console
Same VPC or VPC peering — EFS must be in the same VPC as Valohai resources, or set up VPC peering between VPCs
Security group access — Configure EFS security group to allow inbound NFS traffic (port 2049) from Valohai workers security group (
sg-valohai-workers)DNS enabled — If connecting via DNS name, ensure DNS hostnames and DNS resolution are enabled in your VPC
Setup: Configure EFS Access
Step 1: Find Your EFS Details
In AWS Console:
Go to EFS → File systems
Find your file system
Note the File system ID (e.g.,
fs-1234aa62)Note the DNS name (e.g.,
fs-1234aa62.efs.eu-west-1.amazonaws.com)Check the Mount targets tab for availability zone placement
Step 2: Configure Security Group
In AWS Console, go to EC2 → Security Groups
Find your EFS security group (or create one)
Add inbound rule:
Type: NFS
Protocol: TCP
Port: 2049
Source:
sg-valohai-workers(Valohai workers security group)
Save rules
Step 3: Verify VPC Configuration
Ensure your VPC has DNS support enabled:
Go to VPC → Your VPCs
Select your VPC
Click Actions → Edit VPC settings
Verify both are enabled:
✅ Enable DNS resolution
✅ Enable DNS hostnames
Mount EFS in Execution
Basic Mount Configuration
valohai.yaml:
Parameters:
destination— Mount point inside container (e.g.,/mnt/efs-data)source— EFS DNS name with path (format:<file-system-id>.efs.<region>.amazonaws.com:/[path])type— Alwaysnfsfor EFSreadonly—true(recommended) orfalse
Mount Specific EFS Directory
Mounts only the /ml-datasets/training directory from EFS.
Complete Workflow Example
Mount → Process → Save Pattern
Scenario: Preprocess large image dataset stored on EFS, save processed results to Valohai outputs.
valohai.yaml:
preprocess.py:
Result:
✅ Raw images accessed from EFS (no download time)
✅ Processed images saved to
/valohai/outputs/(versioned)✅ Dataset created for reproducible training
✅ Can train on
dataset://imagenet-processed/batch-001anytime
Best Practices
Use Readonly for Input Data
Always Version Final Results
Structure Your EFS Data
Organize data logically for easier mounting and access control.
Monitor EFS Usage
Check EFS metrics in AWS CloudWatch:
Burst credit balance — Ensure you're not exhausting bursting capacity
Throughput utilization — Monitor if hitting limits
IOPS utilization — Check file operation patterns
Handle Mount Errors
Maintaining Reproducibility
⚠️ Critical: EFS data can change between executions. Always save processed results to
/valohai/outputs/for versioning.
The problem:
The solution:
See: Access Network Storage for complete patterns.
Related Pages
Access Network Storage — Overview and when to use NFS
Google Cloud Filestore — GCP equivalent
On-Premises NFS — Mount on-prem storage
Load Data in Jobs — Alternative: Valohai's versioned inputs
Next Steps
Set up EFS in your AWS account (or use existing)
Configure security groups for Valohai access
Create test execution mounting EFS
Build pipeline: mount → process → save to outputs
Monitor EFS performance metrics
Last updated
Was this helpful?
