# Git Submodules

Git submodules let you include one repository as a subdirectory within another. This is useful when your ML project depends on shared code, models, or configurations stored in separate repositories.

## What Are Git Submodules?

[Submodules](https://git-scm.com/book/en/v2/Git-Tools-Submodules) are pointers to specific commits in external repositories. They let you:

* Reuse code across multiple projects without duplication
* Version external dependencies alongside your project
* Keep repositories separate while maintaining relationships

**When to use submodules:**

* Shared utility libraries used by multiple ML projects
* Common preprocessing pipelines
* Vendored dependencies that need version tracking

**When not to use submodules:**

* Frequent changes to both repos (consider merging instead)
* Simple one-time code sharing (just copy the files)
* External packages available via pip/conda

## Add a Submodule

Add a submodule to your repository:

```shell
# Add submodule in a subdirectory
git submodule add git@github.com:username/shared-utils.git libs/shared-utils

# Commit the submodule reference
git add .gitmodules libs/shared-utils
git commit -m "Add shared-utils submodule"
git push
```

Your repository now contains:

* `.gitmodules` – Configuration file listing submodules
* `libs/shared-utils/` – Directory pointing to the external repo

## Configure Valohai Access

Valohai needs access to both your main repository and all submodule repositories.

### Option 1: Use the Same SSH Key (Recommended)

The simplest approach is to use one SSH key across all repositories.

**For GitHub:** GitHub deploy keys are repository-specific, so you can't reuse them. Instead:

1. Generate an SSH key pair in Valohai
2. Add the **public key** to your GitHub account (not as a deploy key):
   * Go to **Settings** (your account) → **SSH and GPG keys** → **New SSH key**
3. Add the **private key** to Valohai project settings

Now Valohai can access your main repo and all submodules.

**For GitLab:** GitLab allows shared deploy keys:

1. Add the deploy key to your main repository
2. Go to each submodule repository
3. Navigate to **Settings** → **Repository** → **Deploy keys** → **Privately accessible deploy keys**
4. Enable the same key for each submodule

**For Bitbucket:** Bitbucket allows reusing deploy keys:

1. Add the same deploy key to your main repository
2. Add the same deploy key to each submodule repository

No special configuration needed—Bitbucket handles this automatically.

### Option 2: Clone Submodules During Execution

If you can't use the same SSH key, clone submodules manually in your step commands:

```yaml
- step:
    name: train-with-submodules
    image: python:3.11
    command:
      - apt-get update && apt-get install -y git
      - echo -e $SUBMODULE_KEY > ~/submodule_key
      - chmod 600 ~/submodule_key
      - export GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no -i ~/submodule_key"
      - git submodule update --init --recursive
      - python train.py
```

Store `SUBMODULE_KEY` as a secret environment variable in project settings.

## Working with Submodules

After adding a submodule, team members must initialize it:

```shell
# Clone the main repository
git clone git@github.com:username/main-repo.git
cd main-repo

# Initialize submodules
git submodule update --init --recursive
```

**Update a submodule to a newer commit:**

```shell
cd libs/shared-utils
git pull origin main
cd ../..
git add libs/shared-utils
git commit -m "Update shared-utils to latest"
git push
```

**Valohai automatically fetches submodules** if your SSH key has access to all repositories. Note that the repository size limits still apply:

* **Maximum compressed commit size:** 1 GB
* **Warning threshold:** 100 MB (may cause slow fetches or timeouts)

## Submodule URL Formats

Submodules must use SSH format for private repositories:

```
# Correct - SSH format
git@github.com:username/repo.git

# Incorrect - HTTPS won't work with deploy keys
https://github.com/username/repo.git
```

Check your `.gitmodules` file:

```ini
[submodule "libs/shared-utils"]
    path = libs/shared-utils
    url = git@github.com:username/shared-utils.git
```

## Troubleshooting

**"Submodule initialization failed"**

* Valohai's SSH key doesn't have access to the submodule repository
* Solution: Add the public key to the submodule repo (see provider-specific guides)

**"Submodule directory is empty"**

* The submodule wasn't initialized
* Solution: Make sure `.gitmodules` is committed and Valohai has fetched the latest commit

**"Permission denied for submodule"**

* SSH key works for main repo but not submodules
* Solution: Use a single SSH key with access to all repos (see Option 1 above)

**"Submodule points to wrong commit"**

* You updated the submodule but didn't commit the reference
* Solution: After updating a submodule, commit the change in the main repo

## Alternative: Clone During Execution

If submodules are too complex, consider cloning additional repositories during execution:

```yaml
- step:
    name: train-with-external-code
    image: python:3.11
    command:
      - apt-get update && apt-get install -y git
      - git clone https://github.com/username/shared-utils.git /shared-utils
      - python train.py
```

This is simpler but loses version tracking. See [Clone Repositories During Execution](https://docs.valohai.com/git-integration/advanced-topics/clone-during-execution) for details.

## Best Practices

**Pin submodules to specific commits**

* Don't point to `main` or `master` branches
* Use specific commit hashes for reproducibility
* Update deliberately, not automatically

**Keep submodules stable**

* Submodules should change infrequently
* Frequent changes indicate the code should be merged

**Document submodule setup**

* Add initialization instructions to your README
* Explain why each submodule exists

**Use shallow clones for large submodules**

```shell
git submodule update --init --depth 1
```

## Next Steps

* [Clone Repositories During Execution](https://docs.valohai.com/git-integration/advanced-topics/clone-during-execution) for simpler alternatives
* [GitHub Private Repos](https://docs.valohai.com/git-integration/private-repositories/github) for SSH key setup
* [GitLab Private Repos](https://docs.valohai.com/git-integration/private-repositories/gitlab) for shared deploy keys


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/git-integration/advanced-topics/submodules.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
