AWS Elastic File System

Mount AWS Elastic File System (EFS) to access shared network storage directly from Valohai executions.


Overview

AWS EFS provides managed NFS storage that you can mount in Valohai executions queued for AWS environments (ec2 instance).

Use EFS to:

  • Access large datasets without downloading

  • Share preprocessed data across multiple executions

  • Cache intermediate results on fast shared storage

  • Process data in place and save versioned outputs

⚠️ Important: Files on EFS mounts are NOT versioned by Valohai. Always save final results to /valohai/outputs/ for reproducibility.


Prerequisites

Before mounting EFS in Valohai:

  1. Existing EFS file system — Use an existing EFS or create a new one in AWS Console

  2. Same VPC or VPC peering — EFS must be in the same VPC as Valohai resources, or set up VPC peering between VPCs

  3. Security group access — Configure EFS security group to allow inbound NFS traffic (port 2049) from Valohai workers security group (sg-valohai-workers)

  4. DNS enabled — If connecting via DNS name, ensure DNS hostnames and DNS resolution are enabled in your VPC


Setup: Configure EFS Access

Step 1: Find Your EFS Details

In AWS Console:

  1. Go to EFS → File systems

  2. Find your file system

  3. Note the File system ID (e.g., fs-1234aa62)

  4. Note the DNS name (e.g., fs-1234aa62.efs.eu-west-1.amazonaws.com)

  5. Check the Mount targets tab for availability zone placement


Step 2: Configure Security Group

  1. In AWS Console, go to EC2 → Security Groups

  2. Find your EFS security group (or create one)

  3. Add inbound rule:

    • Type: NFS

    • Protocol: TCP

    • Port: 2049

    • Source: sg-valohai-workers (Valohai workers security group)

  4. Save rules


Step 3: Verify VPC Configuration

Ensure your VPC has DNS support enabled:

  1. Go to VPC → Your VPCs

  2. Select your VPC

  3. Click Actions → Edit VPC settings

  4. Verify both are enabled:

    • ✅ Enable DNS resolution

    • ✅ Enable DNS hostnames


Mount EFS in Execution

Basic Mount Configuration

valohai.yaml:

Parameters:

  • destination — Mount point inside container (e.g., /mnt/efs-data)

  • source — EFS DNS name with path (format: <file-system-id>.efs.<region>.amazonaws.com:/[path])

  • type — Always nfs for EFS

  • readonlytrue (recommended) or false


Mount Specific EFS Directory

Mounts only the /ml-datasets/training directory from EFS.


Complete Workflow Example

Mount → Process → Save Pattern

Scenario: Preprocess large image dataset stored on EFS, save processed results to Valohai outputs.

valohai.yaml:

preprocess.py:

Result:

  • ✅ Raw images accessed from EFS (no download time)

  • ✅ Processed images saved to /valohai/outputs/ (versioned)

  • ✅ Dataset created for reproducible training

  • ✅ Can train on dataset://imagenet-processed/batch-001 anytime


Best Practices

Use Readonly for Input Data


Always Version Final Results


Structure Your EFS Data

Organize data logically for easier mounting and access control.


Monitor EFS Usage

Check EFS metrics in AWS CloudWatch:

  • Burst credit balance — Ensure you're not exhausting bursting capacity

  • Throughput utilization — Monitor if hitting limits

  • IOPS utilization — Check file operation patterns


Handle Mount Errors


Maintaining Reproducibility

⚠️ Critical: EFS data can change between executions. Always save processed results to /valohai/outputs/ for versioning.

The problem:

The solution:

See: Access Network Storage for complete patterns.



Next Steps

  • Set up EFS in your AWS account (or use existing)

  • Configure security groups for Valohai access

  • Create test execution mounting EFS

  • Build pipeline: mount → process → save to outputs

  • Monitor EFS performance metrics

Last updated

Was this helpful?