Add Context to Your Files
Your output files shouldn't exist in isolation. Attach experiment details, quality metrics, and production context directly to your files so your team can find, understand, and trust your data.
The Problem
Without metadata, files become black boxes:
Which experiment produced this model?
What was the validation accuracy?
Is this the production-ready version?
What preprocessing was applied to this dataset?
Tracking this information in spreadsheets, wikis, or README files breaks down as projects scale. Valohai solves this by collecting experiment and lineage metadata automatically, and letting you attach additional context directly to files.
Three Types of Metadata
Valohai supports three types of metadata, from simple to sophisticated:
1. Tags — Simple Labels
Organize and filter files with text labels.
Use for: Categorization, status tracking, quick filtering
Example: ["validated", "production", "experiment-42"]
Learn more: Organize Files with Tags
2. Aliases — Stable Pointers
Create human-readable shortcuts to specific files that can be updated over time.
Use for: Production references, "latest" pointers, team coordination
Example: datum://model-prod always points to current production model
Learn more: Create File Shortcuts with Aliases
3. Custom Properties — Rich Data
Store any structured data in JSON format.
Use for: Experiment tracking, quality metrics, production metadata
Example: {"accuracy": 0.95, "factory": "EU", "stage": "release"}
Learn more: Track Custom Metadata
Quick Comparison
Tags
List of strings
Yes
Mark files as "validated" or "production-ready"
Aliases
Single string pointer
Yes (pointer updates)
Point "model-prod" to latest approved model
Properties
Any JSON
Yes
Store {"accuracy": 0.95, "hyperparams": {...}}
💡 Tags and aliases are actually special property keys (
valohai.tagsandvalohai.alias). You can combine all three in the same metadata file.
How to Add Metadata
You have three options for adding metadata to your files. Choose based on when you want to add it and how many files you're processing.
Decision Tree
┌─ Saving 1-2 files?
│ └─→ Use sidecar files (.metadata.json)
│
┌─ Saving 3+ files?
│ └─→ Use single metadata file (valohai.metadata.jsonl) ← RECOMMENDED
│
└─ After execution completes?
└─→ Use APIMethod 1: Sidecar Files (1-2 Files)
Save a .metadata.json file alongside each output file.
Naming Rules (Critical!)
The metadata file must have the exact same name as your output file, plus .metadata.json:
Correct:
model.pkl → model.pkl.metadata.json
data.csv → data.csv.metadata.json
results.json → results.json.metadata.json
Wrong:
model.pkl → model.metadata.json (missing .pkl)
model.pkl → metadata.json (missing full filename)
data.csv → data.csv.meta.json (wrong extension)Python Example
import json
# Your metadata (tags, alias, and custom properties)
metadata = {
"valohai.tags": ["validated", "production"],
"valohai.alias": "model-prod",
"accuracy": 0.95,
"epochs": 100
}
# Save your output file
save_path = '/valohai/outputs/model.pkl'
model.save(save_path)
# Save metadata file
metadata_path = f'{save_path }.metadata.json'
with open(metadata_path, 'w') as f:
json.dump(metadata, f)Method 2: Single Metadata File (3+ Files) — RECOMMENDED
When processing many files, creating individual .metadata.json files is tedious. Use one valohai.metadata.jsonl file instead.
Why This Is Better
Without JSONL (tedious):
100 output files = 200 total files
/valohai/outputs/image_001.jpg
/valohai/outputs/image_001.jpg.metadata.json
/valohai/outputs/image_002.jpg
/valohai/outputs/image_002.jpg.metadata.json
... (98 more pairs)With JSONL (clean):
100 output files = 101 total files
/valohai/outputs/image_001.jpg
/valohai/outputs/image_002.jpg
... (98 more images)
/valohai/outputs/valohai.metadata.jsonl ← One file for all metadataFormat Requirements
Filename: Must be exactly valohai.metadata.jsonl
Location: /valohai/outputs/valohai.metadata.jsonl
Format: JSON Lines (JSONL) — one JSON object per line, newline-separated
Each line must have this structure:
{"file": "output_filename.ext", "metadata": {"your": "properties"}}⚠️ Important: JSONL requires a newline (
\n) after each JSON object. Missing newlines will cause parsing errors.
⚠️ If output file is not saved under
/valohai/inputs/file.txtbut instead under one or more subdirectories (e.g/valohai/inputs/subdir/subdir_2/file.txt) those have to be included in the value offilefield insidevalohai.metadata.jsonl as well.e.g. For such file:
/valohai/inputs/subdir/subdir_2/file.txtValue offileshould be:subdir/subdir_2/file.txt
Python Example
import json
# Process many files
for i in range(100):
# Save output file
image_path = f'/valohai/outputs/image_{i:03d}.jpg'
processed_image.save(image_path)
# Create single metadata file for all outputs
metadata_path = '/valohai/outputs/valohai.metadata.jsonl'
with open(metadata_path, 'w') as f:
for i in range(100):
metadata_entry = {
"file": f"image_{i:03d}.jpg",
"metadata": {
"quality_score": scores[i],
"processing_time": times[i],
"valohai.tags": ["processed", "batch-2024-Q1"]
}
}
json.dump(metadata_entry, f)
f.write('\n') # Critical: newline after each entryCommon JSONL Mistakes
# Wrong: Missing newlines
with open('/valohai/outputs/valohai.metadata.jsonl', 'w') as f:
json.dump({"file": "file1.jpg", "metadata": {...}}, f)
json.dump({"file": "file2.jpg", "metadata": {...}}, f) # No \n!
# Correct: Newline after each object
with open('/valohai/outputs/valohai.metadata.jsonl', 'w') as f:
json.dump({"file": "file1.jpg", "metadata": {...}}, f)
f.write('\n')
json.dump({"file": "file2.jpg", "metadata": {...}}, f)
f.write('\n')Helper Function
Create a reusable helper for your projects:
import json
def save_metadata_jsonl(file_metadata_dict, output_dir='/valohai/outputs'):
"""
Save metadata for multiple files in JSONL format.
Args:
file_metadata_dict: Dict mapping filenames to metadata dicts
e.g., {"model.pkl": {"accuracy": 0.95}}
"""
metadata_path = f'{output_dir}/valohai.metadata.jsonl'
with open(metadata_path, 'w') as f:
for filename, metadata in file_metadata_dict.items():
json.dump({"file": filename, "metadata": metadata}, f)
f.write('\n')
# Usage
file_metadata = {
"model.pkl": {"accuracy": 0.95, "valohai.alias": "model-prod"},
"data.csv": {"rows": 10000, "valohai.tags": ["validated"]},
"results.json": {"experiments": 42}
}
save_metadata_jsonl(file_metadata)With valohai-utils
The valohai-utils package provides built-in helpers:
import valohai
with valohai.output_properties() as properties:
for i in range(100):
filename = f"image_{i:03d}.jpg"
# Save output file
image.save(valohai.outputs().path(filename))
# Add metadata
properties.add(
file=filename,
properties={
"quality_score": scores[i],
"valohai.tags": ["processed"]
}
)Method 3: API (After Execution)
Add or update metadata after execution completes using the Valohai API. Useful for validation workflows, quality gates, or manual approval steps.
Three API Endpoints
/api/v0/data/{id}/metadata/
One file, one metadata set
Apply properties to single datum
/api/v0/data/metadata/apply/
Multiple files, different metadata
Apply different properties to each datum
/api/v0/data/metadata/apply-all/
Multiple files, same metadata
Apply same properties to all datums
Quick Example
import os
import requests
properties = {
"validation_score": 0.98,
"approved_by": "data-team",
"valohai.tags": ["validated"]
}
datum_id = "01234567-89ab-cdef-0123-456789abcdef"
response = requests.post(
f'https://app.valohai.com/api/v0/data/{datum_id}/metadata/',
json=properties,
headers={
'Authorization': 'Token ' + os.getenv('VH_TOKEN'),
'Content-Type': 'application/json'
}
)Reserved Metadata Keys
Two keys have special meaning in Valohai:
valohai.dataset-versions
List of dataset version URLs
Includes this datum in the dataset version
valohai.model-versions
List of model version URLs
Includes this datum in the model version
All other keys are your custom properties.
Example Combining All Three
metadata = {
# Reserved Valohai keys
"valohai.tags": ["validated", "production", "resnet50"],
"valohai.alias": "model-prod",
"valohai.dataset-versions": ["dataset://big-data/processed"],
# Your custom properties
"accuracy": 0.95,
"precision": 0.93,
"recall": 0.97,
"epochs": 100,
"learning_rate": 0.001,
"dataset_version": "v2.3",
"training_duration_minutes": 145,
"experiment_id": "exp-042"
}Common Issues & Fixes
Metadata Not Appearing
For sidecar files:
Wrong filename → Must be
output.ext.metadata.json(exact match plus suffix)Not saved to
/valohai/outputs/→ Save in same directory as outputInvalid JSON → Validate syntax (commas, quotes, brackets)
For JSONL file:
Wrong filename → Must be exactly
valohai.metadata.jsonlMissing newlines → Add
f.write('\n')after eachjson.dump()Wrong structure → Each line must have
{"file": "...", "metadata": {...}}
"Do I Need .metadata.json for Every File?"
No! This is the most common confusion. Here's the comparison:
Sidecar approach (1-2 files):
# Good for small number of outputs
model.save('/valohai/outputs/model.pkl')
with open('/valohai/outputs/model.pkl.metadata.json', 'w') as f:
json.dump(metadata, f)JSONL approach (3+ files):
# Much better for many outputs
for i in range(100):
image.save(f'/valohai/outputs/image_{i}.jpg')
# One metadata file for all 100 images
with open('/valohai/outputs/valohai.metadata.jsonl', 'w') as f:
for i in range(100):
json.dump({"file": f"image_{i}.jpg", "metadata": {...}}, f)
f.write('\n')Result:
Sidecar: 100 images = 200 files 😱
JSONL: 100 images = 101 files 😊
Next Steps
Now that you understand the metadata system, dive into specific use cases:
Organize Files with Tags — Label files for filtering and discovery
Create File Shortcuts with Aliases — Set up production references and team coordination
Track Custom Metadata — Store experiment results, quality metrics, and more
Related Pages
Save Files — Save outputs that can have metadata
Versioning and Lineage — Track file dependencies
Load Data in Jobs — Use files with metadata as inputs
Last updated
Was this helpful?
