# Managing Large YAML Files

As your project grows to 30+ steps, your `valohai.yaml` can become repetitive and hard to maintain.

YAML anchors and aliases let you define reusable blocks once and reference them everywhere, keeping your config clean and consistent.

***

## Why This Matters

**Reduce duplication**: Define common inputs, parameters, or commands once instead of copying them across dozens of steps.

**Easier updates**: Change a dataset path in one place, and it updates everywhere that references it.

**Better readability**: A 500-line YAML with anchors is easier to scan than a 2000-line file with repetition.

***

## YAML Anchors & Aliases: The Basics

### Define a reusable block with `&anchor`

```yaml
- definitions:
    my-common-inputs: &common_inputs  # <- Anchor named "common_inputs"
      - name: dataset
        default: s3://my-bucket/train.csv
      - name: config
        default: s3://my-bucket/config.yaml

- step:
    name: train-model
    image: tensorflow/tensorflow:2.6.0
    command: python train.py
    inputs: *common_inputs  # Uses the block defined above

- step:
    name: evaluate-model
    image: tensorflow/tensorflow:2.6.0
    command: python evaluate.py
    inputs: *common_inputs  # Same inputs, no repetition
```

Both steps now share the same input definitions. Update `&common_inputs` once, and both steps inherit the change.

***

## Common Use Cases

### Shared input datasets

```yaml
- definitions:
    standard-datasets: &datasets
    - name: train-set
      default: s3://data/train/*
    - name: test-set
      default: s3://data/test/*

- step:
    name: model-a
    image: python:3.13
    command: python train_a.py
    inputs: *datasets

- step:
    name: model-b
    image: python:3.13
    command: python train_b.py
    inputs: *datasets
```

### Repeated parameters

```yaml
- definitions:
    tuning-params:
      - &learning_rate_param
        name: learning_rate
        default: 0.001
        type: float
      - &weight_decay_param
        name: weight_decay
        default: 0.0001
        type: float

- step:
    name: train-cnn
    image: python:3.13
    command: python train_cnn.py {parameters}
    parameters:
      - *learning_rate_param
      - *weight_decay_param

- step:
    name: train-transformer
    image: python:3.13
    command: python train_transformer.py {parameters}
    parameters:
      ## Reference an anchor but also override certain properties (e.g. default value)
      - <<: *learning_rate_param
        default: 0.002
      - *weight_decay_param
```

### Standard commands

```yaml
- definitions:
    setup-commands: &preprocess_cmd
      - apt-get update
      - pip install -r requirements.txt
      - pip install valohai-utils
      - python preprocess.py
    long-commands:
      - &setup_env export PYTHONPATH=/app && mkdir -p /valohai/outputs/checkpoints && mkdir -p /valohai/outputs/logs

## Reuse the whole command section
- step:
    name: preprocess
    image: python:3.13
    command: *preprocess_cmd

## Reuse only some of the long commands and add some more
- step:
    name: train
    image: python:3.13
    command:
      - *setup_env
      - python train.py
```

***

## Merge and Override with `<<: *anchor`

:heavy\_check\_mark: If the anchor references an object (a single parameter, a single input ... ), you can merge it with additional properties or override the existing ones.

```yaml
- definitions:
    tuning-params:
      - &learning_rate_param
        name: learning_rate
        default: 0.001
        type: float
      - &weight_decay_param
        name: weight_decay
        default: 0.0001
        type: float

- step:
    name: train-transformer
    image: python:3.13
    command: python train_transformer.py {parameters}
    parameters:
      ## learning_rate_param is referencing an object
      - <<: *learning_rate_param
        default: 0.002 ## overrides the originally set default value
        optional: true ## adds a new property
      - *weight_decay_param
```

:x: If the anchor references a list (such as a list of parameters or commands), you cannot merge it with additional elements; it can only populate the entire property.

```yaml
- definitions:
    setup-commands: &setup
      - apt-get update
      - pip install -r requirements.txt
      - pip install valohai-utils

- step:
    name: preprocess
    image: python:3.13
    command:
        ## The "setup" anchor references a list of strings, resulting in a command section
        ## with the following structure: [[string, string, string], string]
        ## This will cause a lint error as the command is expected to be a list of strings.
      - *setup
      - python preprocess.py

```

The next syntax might appear valid, and the linter will not report any errors ...

```yaml
- definitions:
    base-params: &base
      - name: epochs
        default: 10
        type: integer
      - name: dataset_name
        type: string
        default: small_set

- step:
    name: quick-test
    image: python:3.13
    command: echo {parameters}
    parameters:
      - <<: *base  # Merge base parameters
      - name: debug_mode  # Add a new parameter
        default: true
        type: flag
```

But the output of the `quick-test` step will be:

> \--epochs=10 --debug\_mode

Which indicates that only the first parameter (`epochs`) is taken from the `base` anchor, confirming that merging lists with anchors and `<<` operator is not possible.

***

## Tips for Large YAML Files

**Define anchors at the top**: Keep all reusable blocks in a `definitions` section at the start of your file for easy reference.

```yaml
# Anchor definitions
- definitions:
    common-inputs: &inputs
      - name: dataset
        default: s3://bucket/data.csv

    training-params: &params
      - name: epochs
        default: 10
        type: integer

# Steps
- step:
    name: train
    image: python:3.13
    command: python train.py {parameters}
    inputs: *inputs
    parameters: *params
```

**Use descriptive anchor names**: `&training_params` is clearer than `&params1`.

**Don't over-anchor**: If a block is only used once, don't create an anchor. They're for repeated content.

**Lint regularly**: Run `vh lint` after editing anchors to catch syntax mistakes

***

## What's Next?

* [Generate YAML with valohai-utils](/valohai.yaml-overview/generate-from-python.md) to skip writing YAML by hand (Python users)
* [Validate YAML with the linter](/valohai.yaml-overview/lint.md) to catch anchor syntax errors
* [Multiple YAML files](/valohai.yaml-overview/multiple-files.md) for monorepo organization


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.valohai.com/valohai.yaml-overview/large-yaml.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
