Use .vhignore
Exclude files from being uploaded to Valohai workers
Use .vhignore to exclude files from your Git repository that shouldn't be uploaded to worker instances. This keeps your executions fast and prevents unnecessary file transfers.
Why .vhignore?
Git tracks all files in your repository, but not all files are needed during execution. .vhignore lets you:
Reduce fetch times by excluding large, unnecessary files
Save disk space on workers
Keep tracked files in Git while excluding them from executions
This is especially useful for documentation, test files, or large assets that should be versioned but aren't needed at runtime.
.vhignore vs .gitignore
.gitignore – Prevents files from being tracked by Git
Files are never committed
Not visible in version control
.vhignore – Prevents files from being uploaded to workers
Files are still committed to Git
Just not transferred during execution
Valohai also respects .gitignore patterns, but .vhignore gives you more control.
Commit Size Limits
Valohai enforces these limits:
Maximum compressed commit size: 1 GB
Warning threshold: 100 MB (may cause slow fetches or timeouts)
If you hit these limits, use .vhignore or move large files to data storage.
.vhignore Syntax
The syntax is identical to .gitignore:
# Ignore specific files
large-file.zip
# Ignore entire directories
docs/
tests/
.github/
# Ignore patterns
*.pdf
*.zip
*.tar.gz
# Negate a rule (include a file that would otherwise be ignored)
!important-config.yamlCommon Patterns
Here are patterns most ML projects should include:
# System files (macOS, Windows, Linux)
.DS_Store
Thumbs.db
.cache/
# Documentation
docs/
*.md
README*
CHANGELOG*
# Testing and CI
tests/
.github/
.gitlab-ci.yml
.circleci/
# Development tools
.vscode/
.idea/
*.pyc
__pycache__/
# Large datasets (should be in data storage, not Git)
data/
datasets/
*.csv
*.parquet
*.h5
*.hdf5
# Archives
*.zip
*.tar
*.tar.gz
*.rar
*.7z
# Model checkpoints (if mistakenly committed)
checkpoints/
*.ckpt
*.pth
*.h5When to Use .vhignore
Use .vhignore for:
Documentation that's useful in Git but not at runtime
Test files and CI/CD configurations
Development tools (IDE configs, linters)
Large static assets already in data storage
Archives or compressed files
Don't use .vhignore for:
Training datasets (use data stores instead)
Model checkpoints (these should be execution outputs)
Anything that affects execution results
Example Setup
Here's a real-world .vhignore for an ML project:
# Documentation
docs/
*.md
LICENSE
# Testing
tests/
pytest.ini
.coverage
# CI/CD
.github/
.gitlab-ci.yml
# IDE
.vscode/
.idea/
# Notebooks (if used for exploration only)
notebooks/
*.ipynb
# System files
.DS_Store
__pycache__/
*.pyc
# Data files (stored in Valohai data store)
data/
datasets/
*.csv
*.parquet
# Archives
*.zip
*.tar.gzAdd .vhignore to Your Repository
Create the file in your repository root:
# Create .vhignore
touch .vhignore
# Add patterns
echo "docs/" >> .vhignore
echo "tests/" >> .vhignore
echo "*.md" >> .vhignore
# Commit and push
git add .vhignore
git commit -m "Add .vhignore to reduce execution overhead"
git pushThen fetch the repository in Valohai:
Go to Settings → Repository
Click Fetch Repository
The next execution will respect your .vhignore patterns.
Verify It Works
Check what was transferred to the worker:
# In your valohai.yaml step command
ls -lah /valohai/repository/Files matching .vhignore patterns should be missing.
Troubleshooting
"Files are still being transferred"
Make sure
.vhignoreis in the repository rootVerify patterns use correct syntax (same as
.gitignore)Remember to fetch the repository after committing
.vhignore
"Execution fails after adding .vhignore"
You may have excluded a required file
Check your step commands—what files do they need?
Use
!patternto un-ignore specific files
"Commit is still too large"
.vhignoreonly excludes files from workers, not from GitLarge files in Git history still count toward the size limit
Move large files to data storage and remove from Git entirely
Best Practices
Start with a comprehensive .vhignore
Add it early in your project
Include common patterns (docs, tests, IDE configs)
Update it as your project grows
Don't over-exclude
Only ignore files you're certain aren't needed
Test executions after adding new patterns
Use negation (
!) for exceptions
Combine with .gitignore
Use
.gitignorefor files that should never be committedUse
.vhignorefor files in Git but not needed at runtimeBoth work together to keep things clean
Document your patterns
Add comments in
.vhignoreexplaining why files are excludedMakes it easier for teammates to understand
Next Steps
Manage Commits to hide old branches
Connect Your Repository if you haven't already
Data Management for handling large datasets
Last updated
Was this helpful?
