Managing Large YAML Files

As your project grows to 30+ steps, your valohai.yaml can become repetitive and hard to maintain.

YAML anchors and aliases let you define reusable blocks once and reference them everywhere, keeping your config clean and consistent.


Why This Matters

Reduce duplication: Define common inputs, parameters, or commands once instead of copying them across dozens of steps.

Easier updates: Change a dataset path in one place, and it updates everywhere that references it.

Better readability: A 500-line YAML with anchors is easier to scan than a 2000-line file with repetition.


YAML Anchors & Aliases: The Basics

Define a reusable block with &anchor

- definitions:
    my-common-inputs: &common_inputs  # <- Anchor named "common_inputs"
      - name: dataset
        default: s3://my-bucket/train.csv
      - name: config
        default: s3://my-bucket/config.yaml

- step:
    name: train-model
    image: tensorflow/tensorflow:2.6.0
    command: python train.py
    inputs: *common_inputs  # Uses the block defined above

- step:
    name: evaluate-model
    image: tensorflow/tensorflow:2.6.0
    command: python evaluate.py
    inputs: *common_inputs  # Same inputs, no repetition

Both steps now share the same input definitions. Update &common_inputs once, and both steps inherit the change.


Common Use Cases

Shared input datasets

Repeated parameters

Standard commands


Merge and Override with <<: *anchor

✔️ If the anchor references an object (a single parameter, a single input ... ), you can merge it with additional properties or override the existing ones.

If the anchor references a list (such as a list of parameters or commands), you cannot merge it with additional elements; it can only populate the entire property.

The next syntax might appear valid, and the linter will not report any errors ...

But the output of the quick-test step will be:

--epochs=10 --debug_mode

Which indicates that only the first parameter (epochs) is taken from the base anchor, confirming that merging lists with anchors and << operator is not possible.


Tips for Large YAML Files

Define anchors at the top: Keep all reusable blocks in a definitions section at the start of your file for easy reference.

Use descriptive anchor names: &training_params is clearer than &params1.

Don't over-anchor: If a block is only used once, don't create an anchor. They're for repeated content.

Lint regularly: Run vh lint after editing anchors to catch syntax mistakes


What's Next?

Last updated

Was this helpful?