# Use .vhignore

Use `.vhignore` to exclude files from your Git repository that shouldn't be uploaded to worker instances. This keeps your executions fast and prevents unnecessary file transfers.

## Why .vhignore?

Git tracks all files in your repository, but not all files are needed during execution. `.vhignore` lets you:

* **Reduce fetch times** by excluding large, unnecessary files
* **Save disk space** on workers
* **Keep tracked files in Git** while excluding them from executions

This is especially useful for documentation, test files, or large assets that should be versioned but aren't needed at runtime.

## .vhignore vs .gitignore

**`.gitignore`** – Prevents files from being tracked by Git

* Files are never committed
* Not visible in version control

**`.vhignore`** – Prevents files from being uploaded to workers

* Files are still committed to Git
* Just not transferred during execution

Valohai also respects `.gitignore` patterns, but `.vhignore` gives you more control.

## Commit Size Limits

Valohai enforces these limits:

* **Maximum compressed commit size:** 1 GB
* **Warning threshold:** 100 MB (may cause slow fetches or timeouts)

If you hit these limits, use `.vhignore` or move large files to data storage.

## .vhignore Syntax

The syntax is identical to `.gitignore`:

```
# Ignore specific files
large-file.zip

# Ignore entire directories
docs/
tests/
.github/

# Ignore patterns
*.pdf
*.zip
*.tar.gz

# Negate a rule (include a file that would otherwise be ignored)
!important-config.yaml
```

## Common Patterns

Here are patterns most ML projects should include:

```
# System files (macOS, Windows, Linux)
.DS_Store
Thumbs.db
.cache/

# Documentation
docs/
*.md
README*
CHANGELOG*

# Testing and CI
tests/
.github/
.gitlab-ci.yml
.circleci/

# Development tools
.vscode/
.idea/
*.pyc
__pycache__/

# Large datasets (should be in data storage, not Git)
data/
datasets/
*.csv
*.parquet
*.h5
*.hdf5

# Archives
*.zip
*.tar
*.tar.gz
*.rar
*.7z

# Model checkpoints (if mistakenly committed)
checkpoints/
*.ckpt
*.pth
*.h5
```

## When to Use .vhignore

**Use .vhignore for:**

* Documentation that's useful in Git but not at runtime
* Test files and CI/CD configurations
* Development tools (IDE configs, linters)
* Large static assets already in data storage
* Archives or compressed files

**Don't use .vhignore for:**

* Training datasets (use data stores instead)
* Model checkpoints (these should be execution outputs)
* Anything that affects execution results

## Example Setup

Here's a real-world `.vhignore` for an ML project:

```
# Documentation
docs/
*.md
LICENSE

# Testing
tests/
pytest.ini
.coverage

# CI/CD
.github/
.gitlab-ci.yml

# IDE
.vscode/
.idea/

# Notebooks (if used for exploration only)
notebooks/
*.ipynb

# System files
.DS_Store
__pycache__/
*.pyc

# Data files (stored in Valohai data store)
data/
datasets/
*.csv
*.parquet

# Archives
*.zip
*.tar.gz
```

## Add .vhignore to Your Repository

Create the file in your repository root:

```shell
# Create .vhignore
touch .vhignore

# Add patterns
echo "docs/" >> .vhignore
echo "tests/" >> .vhignore
echo "*.md" >> .vhignore

# Commit and push
git add .vhignore
git commit -m "Add .vhignore to reduce execution overhead"
git push
```

Then fetch the repository in Valohai:

1. Go to **Settings** → **Repository**
2. Click **Fetch Repository**

The next execution will respect your `.vhignore` patterns.

## Verify It Works

Check what was transferred to the worker:

```shell
# In your valohai.yaml step command
ls -lah /valohai/repository/
```

Files matching `.vhignore` patterns should be missing.

## Troubleshooting

**"Files are still being transferred"**

* Make sure `.vhignore` is in the repository root
* Verify patterns use correct syntax (same as `.gitignore`)
* Remember to fetch the repository after committing `.vhignore`

**"Execution fails after adding .vhignore"**

* You may have excluded a required file
* Check your step commands—what files do they need?
* Use `!pattern` to un-ignore specific files

**"Commit is still too large"**

* `.vhignore` only excludes files from workers, not from Git
* Large files in Git history still count toward the size limit
* Move large files to data storage and remove from Git entirely

## Best Practices

**Start with a comprehensive .vhignore**

* Add it early in your project
* Include common patterns (docs, tests, IDE configs)
* Update it as your project grows

**Don't over-exclude**

* Only ignore files you're certain aren't needed
* Test executions after adding new patterns
* Use negation (`!`) for exceptions

**Combine with .gitignore**

* Use `.gitignore` for files that should never be committed
* Use `.vhignore` for files in Git but not needed at runtime
* Both work together to keep things clean

**Document your patterns**

* Add comments in `.vhignore` explaining why files are excluded
* Makes it easier for teammates to understand

## Next Steps

* [Manage Commits](/git-integration/manage-commits.md) to hide old branches
* [Connect Your Repository](/git-integration/connect-repo.md) if you haven't already
* [Data Management](/data/data-versioning.md) for handling large datasets


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/git-integration/use-vhignore.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
