# Multiple YAML Files & Monorepos

Your repository can contain multiple `valohai.yaml` files. This is useful when different teams or services share one repository but need separate ML configurations.

Each Valohai project connects to one `valohai.yaml` file, which can live anywhere in your repository.

***

## Why Multiple YAML Files?

**Monorepo management**: Different teams (data engineering, ML research, inference) maintain their own configurations without conflicts.

**Service separation**: Each microservice or model has its own isolated workflow definition.

**Environment isolation**: Dev, staging, and production pipelines use different YAML files with different resource requirements.

***

## How It Works

By default, Valohai looks for `valohai.yaml` in your repository root:

```
my-repo/
├── valohai.yaml          # Default location
├── train.py
└── preprocess.py
```

But you can point Valohai to any subfolder:

```
my-repo/
├── data-engineering/
│   └── valohai.yaml      # ETL pipelines
├── model-training/
│   └── valohai.yaml      # ML training jobs
├── inference/
│   └── valohai.yaml      # Batch prediction
└── shared/
    └── utils.py
```

***

## Configure Custom YAML Path

Go to **Project Settings > Repository** and set the YAML path:

```
data-engineering/valohai.yaml
```

Valohai will now use that file instead of the root `valohai.yaml`.

***

## Execution Default Working Directory

Even if your `valohai.yaml` is in a subfolder, Valohai clones your entire repository during execution. The default working directory will be the root of the Git repository, not the one where `valohai.yaml` is placed.

You can reference code from anywhere:

```yaml
# File: model-training/valohai.yaml
- step:
    name: train
    image: python:3.9
    command:
      - python ./shared/utils.py      # Access code from parent directory
      - python train.py
```

The full Git commit is available, so imports and relative paths work as expected.

***

## Launch Jobs from CLI with Custom YAML

When running jobs from the command line, specify the YAML path.

### Adhoc Execution

For adhoc jobs (no Git commit), use the `--yaml` flag:

```shell
vh exec run train-model --adhoc --yaml model-training/valohai.yaml
```

### From Git Commit

For jobs based on a Git commit, first set the YAML path in the web UI (as shown above).

Then run:

```shell
vh exec run train-model --commit abc123
```

Or use project mode to launch from the latest fetched commit:

```shell
vh --project-mode remote --project <project-id> exec run train-model
```

***

## Example: Monorepo Structure

Here's a real-world monorepo setup:

```
ml-platform/
├── data-ingestion/
│   ├── valohai.yaml       # ETL and data validation
│   └── ingest.py
├── training/
│   ├── valohai.yaml       # Model training
│   ├── train.py
│   └── evaluate.py
├── inference/
│   ├── valohai.yaml       # Batch and real-time inference
│   └── predict.py
└── shared/
    ├── preprocessing.py
    └── metrics.py
```

Create three Valohai projects:

1. **Data Ingestion Project** → points to `data-ingestion/valohai.yaml`
2. **Training Project** → points to `training/valohai.yaml`
3. **Inference Project** → points to `inference/valohai.yaml`

Each team works independently but shares the `shared/` utilities.

***

## Best Practices

**Use descriptive paths**: Name folders by function (`training/`, `inference/`) not by team (`team-a/`, `team-b/`).

**Share common code**: Put reusable utilities in a `shared/` or `common/` directory accessible to all projects.

**Coordinate dependencies**: If one YAML depends on outputs from another, use [dataset versioning](https://github.com/valohai/dokuhai/blob/main/concepts/datasets.md) to pass data between projects.

**Keep YAML close to code**: Place `valohai.yaml` in the directory where the relevant Python scripts live for easier navigation.

***

## What's Next?

* [Generate YAML with valohai-utils](https://docs.valohai.com/valohai.yaml-overview/generate-from-python) to skip writing YAML by hand (Python users)
* [Validate your YAML](https://docs.valohai.com/valohai.yaml-overview/lint) with the linter
* [Manage large YAML files](https://docs.valohai.com/valohai.yaml-overview/large-yaml) with anchors
