Git Submodules
Access multiple Git repositories from the same Valohai project
Git submodules let you include one repository as a subdirectory within another. This is useful when your ML project depends on shared code, models, or configurations stored in separate repositories.
What Are Git Submodules?
Submodules are pointers to specific commits in external repositories. They let you:
Reuse code across multiple projects without duplication
Version external dependencies alongside your project
Keep repositories separate while maintaining relationships
When to use submodules:
Shared utility libraries used by multiple ML projects
Common preprocessing pipelines
Vendored dependencies that need version tracking
When not to use submodules:
Frequent changes to both repos (consider merging instead)
Simple one-time code sharing (just copy the files)
External packages available via pip/conda
Add a Submodule
Add a submodule to your repository:
# Add submodule in a subdirectory
git submodule add [email protected]:username/shared-utils.git libs/shared-utils
# Commit the submodule reference
git add .gitmodules libs/shared-utils
git commit -m "Add shared-utils submodule"
git pushYour repository now contains:
.gitmodules– Configuration file listing submoduleslibs/shared-utils/– Directory pointing to the external repo
Configure Valohai Access
Valohai needs access to both your main repository and all submodule repositories.
Option 1: Use the Same SSH Key (Recommended)
The simplest approach is to use one SSH key across all repositories.
For GitHub: GitHub deploy keys are repository-specific, so you can't reuse them. Instead:
Generate an SSH key pair in Valohai
Add the public key to your GitHub account (not as a deploy key):
Go to Settings (your account) → SSH and GPG keys → New SSH key
Add the private key to Valohai project settings
Now Valohai can access your main repo and all submodules.
For GitLab: GitLab allows shared deploy keys:
Add the deploy key to your main repository
Go to each submodule repository
Navigate to Settings → Repository → Deploy keys → Privately accessible deploy keys
Enable the same key for each submodule
For Bitbucket: Bitbucket allows reusing deploy keys:
Add the same deploy key to your main repository
Add the same deploy key to each submodule repository
No special configuration needed—Bitbucket handles this automatically.
Option 2: Clone Submodules During Execution
If you can't use the same SSH key, clone submodules manually in your step commands:
- step:
name: train-with-submodules
image: python:3.11
command:
- apt-get update && apt-get install -y git
- echo -e $SUBMODULE_KEY > ~/submodule_key
- chmod 600 ~/submodule_key
- export GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no -i ~/submodule_key"
- git submodule update --init --recursive
- python train.pyStore SUBMODULE_KEY as a secret environment variable in project settings.
Working with Submodules
After adding a submodule, team members must initialize it:
# Clone the main repository
git clone [email protected]:username/main-repo.git
cd main-repo
# Initialize submodules
git submodule update --init --recursiveUpdate a submodule to a newer commit:
cd libs/shared-utils
git pull origin main
cd ../..
git add libs/shared-utils
git commit -m "Update shared-utils to latest"
git pushValohai automatically fetches submodules if your SSH key has access to all repositories. Note that the repository size limits still apply:
Maximum compressed commit size: 1 GB
Warning threshold: 100 MB (may cause slow fetches or timeouts)
Submodule URL Formats
Submodules must use SSH format for private repositories:
# Correct - SSH format
[email protected]:username/repo.git
# Incorrect - HTTPS won't work with deploy keys
https://github.com/username/repo.gitCheck your .gitmodules file:
[submodule "libs/shared-utils"]
path = libs/shared-utils
url = [email protected]:username/shared-utils.gitTroubleshooting
"Submodule initialization failed"
Valohai's SSH key doesn't have access to the submodule repository
Solution: Add the public key to the submodule repo (see provider-specific guides)
"Submodule directory is empty"
The submodule wasn't initialized
Solution: Make sure
.gitmodulesis committed and Valohai has fetched the latest commit
"Permission denied for submodule"
SSH key works for main repo but not submodules
Solution: Use a single SSH key with access to all repos (see Option 1 above)
"Submodule points to wrong commit"
You updated the submodule but didn't commit the reference
Solution: After updating a submodule, commit the change in the main repo
Alternative: Clone During Execution
If submodules are too complex, consider cloning additional repositories during execution:
- step:
name: train-with-external-code
image: python:3.11
command:
- apt-get update && apt-get install -y git
- git clone https://github.com/username/shared-utils.git /shared-utils
- python train.pyThis is simpler but loses version tracking. See Clone Repositories During Execution for details.
Best Practices
Pin submodules to specific commits
Don't point to
mainormasterbranchesUse specific commit hashes for reproducibility
Update deliberately, not automatically
Keep submodules stable
Submodules should change infrequently
Frequent changes indicate the code should be merged
Document submodule setup
Add initialization instructions to your README
Explain why each submodule exists
Use shallow clones for large submodules
git submodule update --init --depth 1Next Steps
Clone Repositories During Execution for simpler alternatives
GitHub Private Repos for SSH key setup
GitLab Private Repos for shared deploy keys
Last updated
Was this helpful?
