Use .vhignore

Exclude files from being uploaded to Valohai workers

Use .vhignore to exclude files from your Git repository that shouldn't be uploaded to worker instances. This keeps your executions fast and prevents unnecessary file transfers.

Why .vhignore?

Git tracks all files in your repository, but not all files are needed during execution. .vhignore lets you:

  • Reduce fetch times by excluding large, unnecessary files

  • Save disk space on workers

  • Keep tracked files in Git while excluding them from executions

This is especially useful for documentation, test files, or large assets that should be versioned but aren't needed at runtime.

.vhignore vs .gitignore

.gitignore – Prevents files from being tracked by Git

  • Files are never committed

  • Not visible in version control

.vhignore – Prevents files from being uploaded to workers

  • Files are still committed to Git

  • Just not transferred during execution

Valohai also respects .gitignore patterns, but .vhignore gives you more control.

Commit Size Limits

Valohai enforces these limits:

  • Maximum compressed commit size: 1 GB

  • Warning threshold: 100 MB (may cause slow fetches or timeouts)

If you hit these limits, use .vhignore or move large files to data storage.

.vhignore Syntax

The syntax is identical to .gitignore:

# Ignore specific files
large-file.zip

# Ignore entire directories
docs/
tests/
.github/

# Ignore patterns
*.pdf
*.zip
*.tar.gz

# Negate a rule (include a file that would otherwise be ignored)
!important-config.yaml

Common Patterns

Here are patterns most ML projects should include:

# System files (macOS, Windows, Linux)
.DS_Store
Thumbs.db
.cache/

# Documentation
docs/
*.md
README*
CHANGELOG*

# Testing and CI
tests/
.github/
.gitlab-ci.yml
.circleci/

# Development tools
.vscode/
.idea/
*.pyc
__pycache__/

# Large datasets (should be in data storage, not Git)
data/
datasets/
*.csv
*.parquet
*.h5
*.hdf5

# Archives
*.zip
*.tar
*.tar.gz
*.rar
*.7z

# Model checkpoints (if mistakenly committed)
checkpoints/
*.ckpt
*.pth
*.h5

When to Use .vhignore

Use .vhignore for:

  • Documentation that's useful in Git but not at runtime

  • Test files and CI/CD configurations

  • Development tools (IDE configs, linters)

  • Large static assets already in data storage

  • Archives or compressed files

Don't use .vhignore for:

  • Training datasets (use data stores instead)

  • Model checkpoints (these should be execution outputs)

  • Anything that affects execution results

Example Setup

Here's a real-world .vhignore for an ML project:

# Documentation
docs/
*.md
LICENSE

# Testing
tests/
pytest.ini
.coverage

# CI/CD
.github/
.gitlab-ci.yml

# IDE
.vscode/
.idea/

# Notebooks (if used for exploration only)
notebooks/
*.ipynb

# System files
.DS_Store
__pycache__/
*.pyc

# Data files (stored in Valohai data store)
data/
datasets/
*.csv
*.parquet

# Archives
*.zip
*.tar.gz

Add .vhignore to Your Repository

Create the file in your repository root:

# Create .vhignore
touch .vhignore

# Add patterns
echo "docs/" >> .vhignore
echo "tests/" >> .vhignore
echo "*.md" >> .vhignore

# Commit and push
git add .vhignore
git commit -m "Add .vhignore to reduce execution overhead"
git push

Then fetch the repository in Valohai:

  1. Go to SettingsRepository

  2. Click Fetch Repository

The next execution will respect your .vhignore patterns.

Verify It Works

Check what was transferred to the worker:

# In your valohai.yaml step command
ls -lah /valohai/repository/

Files matching .vhignore patterns should be missing.

Troubleshooting

"Files are still being transferred"

  • Make sure .vhignore is in the repository root

  • Verify patterns use correct syntax (same as .gitignore)

  • Remember to fetch the repository after committing .vhignore

"Execution fails after adding .vhignore"

  • You may have excluded a required file

  • Check your step commands—what files do they need?

  • Use !pattern to un-ignore specific files

"Commit is still too large"

  • .vhignore only excludes files from workers, not from Git

  • Large files in Git history still count toward the size limit

  • Move large files to data storage and remove from Git entirely

Best Practices

Start with a comprehensive .vhignore

  • Add it early in your project

  • Include common patterns (docs, tests, IDE configs)

  • Update it as your project grows

Don't over-exclude

  • Only ignore files you're certain aren't needed

  • Test executions after adding new patterns

  • Use negation (!) for exceptions

Combine with .gitignore

  • Use .gitignore for files that should never be committed

  • Use .vhignore for files in Git but not needed at runtime

  • Both work together to keep things clean

Document your patterns

  • Add comments in .vhignore explaining why files are excluded

  • Makes it easier for teammates to understand

Next Steps

Last updated

Was this helpful?