# Multiple YAML Files & Monorepos

Your repository can contain multiple `valohai.yaml` files. This is useful when different teams or services share one repository but need separate ML configurations.

Each Valohai project connects to one `valohai.yaml` file, which can live anywhere in your repository.

***

## Why Multiple YAML Files?

**Monorepo management**: Different teams (data engineering, ML research, inference) maintain their own configurations without conflicts.

**Service separation**: Each microservice or model has its own isolated workflow definition.

**Environment isolation**: Dev, staging, and production pipelines use different YAML files with different resource requirements.

***

## How It Works

By default, Valohai looks for `valohai.yaml` in your repository root:

```
my-repo/
├── valohai.yaml          # Default location
├── train.py
└── preprocess.py
```

But you can point Valohai to any subfolder:

```
my-repo/
├── data-engineering/
│   └── valohai.yaml      # ETL pipelines
├── model-training/
│   └── valohai.yaml      # ML training jobs
├── inference/
│   └── valohai.yaml      # Batch prediction
└── shared/
    └── utils.py
```

***

## Configure Custom YAML Path

Go to **Project Settings > Repository** and set the YAML path:

```
data-engineering/valohai.yaml
```

Valohai will now use that file instead of the root `valohai.yaml`.

***

## Execution Default Working Directory

Even if your `valohai.yaml` is in a subfolder, Valohai clones your entire repository during execution. The default working directory will be the root of the Git repository, not the one where `valohai.yaml` is placed.

You can reference code from anywhere:

```yaml
# File: model-training/valohai.yaml
- step:
    name: train
    image: python:3.9
    command:
      - python ./shared/utils.py      # Access code from parent directory
      - python train.py
```

The full Git commit is available, so imports and relative paths work as expected.

***

## Launch Jobs from CLI with Custom YAML

When running jobs from the command line, specify the YAML path.

### Adhoc Execution

For adhoc jobs (no Git commit), use the `--yaml` flag:

```shell
vh exec run train-model --adhoc --yaml model-training/valohai.yaml
```

### From Git Commit

For jobs based on a Git commit, first set the YAML path in the web UI (as shown above).

Then run:

```shell
vh exec run train-model --commit abc123
```

Or use project mode to launch from the latest fetched commit:

```shell
vh --project-mode remote --project <project-id> exec run train-model
```

***

## Example: Monorepo Structure

Here's a real-world monorepo setup:

```
ml-platform/
├── data-ingestion/
│   ├── valohai.yaml       # ETL and data validation
│   └── ingest.py
├── training/
│   ├── valohai.yaml       # Model training
│   ├── train.py
│   └── evaluate.py
├── inference/
│   ├── valohai.yaml       # Batch and real-time inference
│   └── predict.py
└── shared/
    ├── preprocessing.py
    └── metrics.py
```

Create three Valohai projects:

1. **Data Ingestion Project** → points to `data-ingestion/valohai.yaml`
2. **Training Project** → points to `training/valohai.yaml`
3. **Inference Project** → points to `inference/valohai.yaml`

Each team works independently but shares the `shared/` utilities.

***

## Best Practices

**Use descriptive paths**: Name folders by function (`training/`, `inference/`) not by team (`team-a/`, `team-b/`).

**Share common code**: Put reusable utilities in a `shared/` or `common/` directory accessible to all projects.

**Coordinate dependencies**: If one YAML depends on outputs from another, use [dataset versioning](https://github.com/valohai/dokuhai/blob/main/concepts/datasets.md) to pass data between projects.

**Keep YAML close to code**: Place `valohai.yaml` in the directory where the relevant Python scripts live for easier navigation.

***

## What's Next?

* [Generate YAML with valohai-utils](https://docs.valohai.com/valohai.yaml-overview/generate-from-python) to skip writing YAML by hand (Python users)
* [Validate your YAML](https://docs.valohai.com/valohai.yaml-overview/lint) with the linter
* [Manage large YAML files](https://docs.valohai.com/valohai.yaml-overview/large-yaml) with anchors


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/valohai.yaml-overview/multiple-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
