Managing Large YAML Files
Use YAML anchors and aliases to reduce repetition and keep configs maintainable
As your project grows to 30+ steps, your valohai.yaml can become repetitive and hard to maintain.
YAML anchors and aliases let you define reusable blocks once and reference them everywhere, keeping your config clean and consistent.
Why This Matters
Reduce duplication: Define common inputs, parameters, or commands once instead of copying them across dozens of steps.
Easier updates: Change a dataset path in one place, and it updates everywhere that references it.
Better readability: A 500-line YAML with anchors is easier to scan than a 2000-line file with repetition.
YAML Anchors & Aliases: The Basics
Define a reusable block with &anchor
&anchor- definitions:
my-common-inputs: &common_inputs # <- Anchor named "common_inputs"
- name: dataset
default: s3://my-bucket/train.csv
- name: config
default: s3://my-bucket/config.yaml
- step:
name: train-model
image: tensorflow/tensorflow:2.6.0
command: python train.py
inputs: *common_inputs # Uses the block defined above
- step:
name: evaluate-model
image: tensorflow/tensorflow:2.6.0
command: python evaluate.py
inputs: *common_inputs # Same inputs, no repetitionBoth steps now share the same input definitions. Update &common_inputs once, and both steps inherit the change.
Common Use Cases
Shared input datasets
- definitions:
standard-datasets: &datasets
- name: train-set
default: s3://data/train/*
- name: test-set
default: s3://data/test/*
- step:
name: model-a
image: python:3.13
command: python train_a.py
inputs: *datasets
- step:
name: model-b
image: python:3.13
command: python train_b.py
inputs: *datasetsRepeated parameters
- definitions:
tuning-params: &hyperparams
- name: learning_rate
default: 0.001
type: float
- name: weight_decay
default: 0.0001
type: float
- step:
name: train-cnn
image: python:3.13
command: python train_cnn.py {parameters}
parameters: *hyperparams
- step:
name: train-transformer
image: python:3.13
command: python train_transformer.py {parameters}
parameters: *hyperparamsStandard commands
- definitions:
setup-commands: &setup
- apt-get update
- pip install -r requirements.txt
- pip install valohai-utils
- step:
name: preprocess
image: python:3.13
command:
- *setup
- python preprocess.py
- step:
name: train
image: python:3.13
command:
- *setup
- python train.pyMerge and Override with <<: *anchor
<<: *anchorYou can merge an anchor and add extra fields:
- definitions:
base-params: &base
- name: epochs
default: 10
type: integer
- step:
name: quick-test
image: python:3.13
command: python train.py {parameters}
parameters:
- <<: *base # Merge base parameters
- name: debug_mode # Add a new parameter
default: true
type: flagThis keeps the epochs parameter from &base and adds debug_mode.
Tips for Large YAML Files
Define anchors at the top: Keep all reusable blocks in a definitions section at the start of your file for easy reference.
# Anchor definitions
- definitions:
common-inputs: &inputs
- name: dataset
default: s3://bucket/data.csv
training-params: ¶ms
- name: epochs
default: 10
type: integer
# Steps
- step:
name: train
image: python:3.13
command: python train.py {parameters}
inputs: *inputs
parameters: *paramsUse descriptive anchor names: &training_params is clearer than ¶ms1.
Don't over-anchor: If a block is only used once, don't create an anchor. They're for repeated content.
Lint regularly: Run vh lint after editing anchors to catch syntax mistakes
What's Next?
Generate YAML with valohai-utils to skip writing YAML by hand (Python users)
Validate YAML with the linter to catch anchor syntax errors
Multiple YAML files for monorepo organization
Last updated
Was this helpful?
