# Clone Repositories During Execution

Sometimes you need to access code or files from another repository during execution—like shared utilities, model definitions, or configuration files. This guide shows how to securely clone private repositories at runtime.

> :bulb: **Reproducibility note:** Valohai tracks the commit of your connected repository automatically. Commits from other repositories cloned during execution are **not tracked**. For full reproducibility, consider using [Git submodules](https://docs.valohai.com/git-integration/advanced-topics/submodules) instead.

## Use Cases

Clone additional repositories when:

* You need utility functions from a shared library
* Model definitions are in a separate repo
* Configuration files are centrally managed
* You're testing code before making it a submodule

**Don't use this for:**

* Datasets (use Valohai data inputs instead)
* Model checkpoints (use execution outputs)
* Anything that changes frequently (consider merging repos)

## Clone a Private Repository

### Step 1: Generate SSH Key

Create an SSH key pair for accessing the external repository:

```shell
ssh-keygen -t rsa -b 4096 -f valohai-external-key
```

This creates:

* `valohai-external-key.pub` – Public key (add to Git provider)
* `valohai-external-key` – Private key (add to Valohai)

### Step 2: Add Public Key to Git Provider

Add the public key as a deploy key on the repository you want to clone:

**GitHub:**

1. Go to the repository → **Settings** → **Deploy keys** → **Add deploy key**
2. Paste the contents of `valohai-external-key.pub`
3. Leave "Allow write access" **unchecked**

**GitLab:**

1. Go to the repository → **Settings** → **Repository** → **Deploy keys**
2. Paste the contents of `valohai-external-key.pub`

**Bitbucket:**

1. Go to the repository → **Settings** → **Access keys** → **Add key**
2. Paste the contents of `valohai-external-key.pub`

### Step 3: Add Private Key to Valohai

Store the private key as a secret environment variable:

1. Open your Valohai project
2. Go to **Settings** → **Environment Variables**
3. Add a new variable:
   * **Name:** `EXTERNAL_REPO_KEY`
   * **Value:** The private key with `\n` replacing newlines

**Important:** Valohai doesn't encode newlines automatically. Format your key like this:

```
-----BEGIN OPENSSH PRIVATE KEY-----\n<key content>\n-----END OPENSSH PRIVATE KEY-----
```

4. Check **Secret** to hide the value from logs
5. Click **Save**

### Step 4: Clone in Your Step

Update your `valohai.yaml` to clone the repository during execution:

```yaml
- step:
    name: train-with-external-repo
    image: python:3.11
    command:
      # Install Git
      - apt-get update
      - apt-get install -y git

      # Write the SSH key to a file
      - echo -e $EXTERNAL_REPO_KEY > ~/external_key
      - chmod 600 ~/external_key

      # Configure Git to use the key
      - export GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no -i ~/external_key"

      # Clone the external repository
      - git clone git@github.com:username/external-repo.git /external-repo

      # Verify the clone
      - ls -la /external-repo

      # Run your training script
      - python train.py
```

The cloned repository will be available at `/external-repo` during execution.

## Access Files from the Cloned Repo

Use the cloned repository in your Python code:

```python
import sys

sys.path.append("/external-repo")

# Import modules from the external repo
from external_repo.utils import preprocess_data

# Or read configuration files
with open("/external-repo/config.yaml", "r") as f:
    config = yaml.safe_load(f)
```

## Clone a Public Repository

Public repositories are simpler—no SSH key needed:

```yaml
- step:
    name: train-with-public-repo
    image: python:3.11
    command:
      - apt-get update && apt-get install -y git
      - git clone https://github.com/username/public-repo.git /public-repo
      - python train.py
```

## Clone Multiple Repositories

You can clone multiple repos in the same execution:

```yaml
- step:
    name: train-with-multiple-repos
    image: python:3.11
    command:
      - apt-get update && apt-get install -y git

      # Set up SSH key
      - echo -e $EXTERNAL_REPO_KEY > ~/key
      - chmod 600 ~/key
      - export GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no -i ~/key"

      # Clone first repo
      - git clone git@github.com:username/repo1.git /repo1

      # Clone second repo
      - git clone git@github.com:username/repo2.git /repo2

      # Use them in your script
      - python train.py
```

## Clone Specific Branch or Commit

Clone a specific branch for reproducibility:

```shell
# Clone a specific branch
git clone -b feature-branch git@github.com:username/repo.git /repo

# Clone a specific commit
git clone git@github.com:username/repo.git /repo
cd /repo
git checkout abc123def
```

Or use shallow clones to save time:

```shell
# Clone only the latest commit (faster)
git clone --depth 1 git@github.com:username/repo.git /repo
```

## Security Best Practices

**Use read-only deploy keys**

* Don't grant write access
* Create separate keys for each external repo
* Rotate keys periodically

**Mark environment variables as secrets**

* Always check the **Secret** box in Valohai
* Secrets are hidden from logs and UI
* They're still accessible in your code

**Use SSH, not HTTPS with tokens**

* SSH keys are more secure than embedded tokens
* They're easier to rotate
* They don't expire

**Don't log private keys**

* Never `echo $EXTERNAL_REPO_KEY` without redirecting to a file
* Be careful with `set -x` or similar debugging flags

## Troubleshooting

**"Permission denied (publickey)"**

* The SSH key wasn't added correctly
* Check that the public key is in the Git provider's deploy keys
* Verify the private key format (must include `\n` for newlines)

**"Host key verification failed"**

* Use `StrictHostKeyChecking=no` in `GIT_SSH_COMMAND`
* This is safe because you're cloning from known Git providers

**"Repository not found"**

* Check the repository URL (must be SSH format: `git@...`)
* Ensure the deploy key has access to the repo
* Verify the repository exists and is accessible

**"Command not found: git"**

* Install Git in your Docker image or step command
* Use `apt-get install -y git` for Debian/Ubuntu images

**"Newline characters in key cause errors"**

* Valohai doesn't automatically encode newlines
* Replace actual newlines with `\n` in the environment variable
* The key should be one long line with `\n` markers

## Alternative: Use Submodules

If you frequently clone the same repositories, consider Git submodules:

* [Git Submodules Guide](https://docs.valohai.com/git-integration/advanced-topics/submodules)

Submodules provide:

* Automatic version tracking
* Simpler setup (no environment variables)
* Better reproducibility

## Next Steps

* [Git Submodules](https://docs.valohai.com/git-integration/advanced-topics/submodules) for better version tracking
* [Environment Variables](https://docs.valohai.com/user-and-organization-management/getting-started/environment-variables) for managing secrets
* [Private Repositories](https://docs.valohai.com/git-integration/private-repositories) for connecting your main repo
