Tags

Tags are simple text labels that help you categorize and quickly find files across your project.


When to Use Tags

Use tags for:

  • Experiment organization — Label outputs by experiment ID, iteration, or variant

  • Quality tracking — Mark files as validated, rejected, or needs-review

  • Environment management — Distinguish between dev, staging, and production data

  • Data sources — Track which dataset, factory, or pipeline produced the file

  • Team ownership — Identify which team or person is responsible

  • Model metadata — Label by architecture, framework, or approach


How to Add Tags

Tags are added through metadata using the reserved key valohai.tags.

During Execution

import json

# Tags are a list of strings
metadata = {
    "valohai.tags": ["validated", "production", "experiment-42"]
}

# Save your output file
save_path = '/valohai/outputs/model.pkl'
model.save(save_path)

# Save metadata with tags
metadata_path = '/valohai/outputs/model.pkl.metadata.json'
with open(metadata_path, 'w') as f:
    json.dump(metadata, f)

For Multiple Files

Use valohai.metadata.jsonl when tagging many files:

import json

# Save all output files
for i in range(100):
    image.save(f'/valohai/outputs/image_{i:03d}.jpg')

# Tag all files at once
metadata_path = '/valohai/outputs/valohai.metadata.jsonl'
with open(metadata_path, 'w') as f:
    for i in range(100):
        entry = {
            "file": f"image_{i:03d}.jpg",
            "metadata": {
                "valohai.tags": ["processed", "batch-2024-Q1", "validated"]
            }
        }
        json.dump(entry, f)
        f.write('\n')

💡 For complete details on metadata methods, see Add Context to Your Data Files


Find Files Using Tags

Web Application

Search and filter by tags in the Valohai UI:

  1. Navigate to your project's Data tab

  2. Use the search bar to find files by tag name

  3. Click on any file to see all its tags

  4. Filter by multiple tags to narrow results

When creating executions, search for tagged files in the input browser.


Common Tagging Patterns

Experiment Tracking

Label outputs by experiment details:

metadata = {
    "valohai.tags": [
        "experiment-123",
        "baseline",
        "resnet50",
        "2024-Q1"
    ]
}

Track which experiment produced each output and compare results across iterations.


Quality Labels

Mark validation and approval status:

metadata = {
    "valohai.tags": [
        "validated",
        "accuracy-95",
        "production-ready",
        "approved-by-ml-team"
    ]
}

Quickly identify files that have passed quality gates or are ready for production.


Environment Management

Separate outputs by deployment stage:

metadata = {
    "valohai.tags": [
        "staging",
        "preprocessing-v2",
        "ready-for-testing"
    ]
}

Keep development, staging, and production data clearly separated.


Data Source Tracking

Label files by origin:

metadata = {
    "valohai.tags": [
        "factory-eu-02",
        "production-line-A",
        "sensor-data",
        "quality-checked"
    ]
}

Track where data came from for traceability and compliance.


Team Ownership

Identify responsible teams or individuals:

metadata = {
    "valohai.tags": [
        "team-ml-research",
        "owner-alice",
        "project-vision",
        "requires-review"
    ]
}

Help teams coordinate on shared data and outputs.


Model Architecture

Organize models by technical approach:

metadata = {
    "valohai.tags": [
        "transformer",
        "pytorch",
        "fine-tuned",
        "multilingual"
    ]
}

Filter and compare models by architecture or framework.


Combining Multiple Categories

metadata = {
    "valohai.tags": [
        # Experiment
        "experiment-456",
        
        # Quality
        "validated",
        
        # Environment
        "staging",
        
        # Architecture
        "resnet50",
        
        # Team
        "team-cv",
        
        # Source
        "dataset-imagenet-v2"
    ]
}

Use multiple tag categories for flexible filtering and organization.


Best Practices

Use Consistent Naming Conventions

# Good: Consistent, predictable patterns
"experiment-123"
"experiment-124"
"team-ml-research"
"team-data-engineering"

# Avoid: Inconsistent naming
"exp123"
"experiment_124"
"ML Research Team"
"data-eng"

Keep Tags Concise

# Good: Short, scannable
"validated"
"prod"
"v2"

# Avoid: Long, verbose
"this-file-has-been-validated-by-the-quality-team"
"production-environment-version-2"

Use Tags for Filtering, Not Storage

# Good: Tags for categories
{"valohai.tags": ["validated", "high-accuracy"]}

# Wrong: Don't store data in tags
{"valohai.tags": ["accuracy=0.95", "epochs=100"]}  # Use properties instead

For detailed metrics, use custom properties instead.



Next Steps

Last updated

Was this helpful?