Custom Properties

Custom properties let you attach any structured data to your files in JSON format. Use properties to track experiment results, data quality metrics, processing conditions, or any contextual information your team needs.


When to Use Properties

Properties store rich data beyond simple labels:

Experiment Tracking

Track hyperparameters, metrics, and training details:

{
    "model_architecture": "resnet50",
    "optimizer": "adam",
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 100,
    "final_loss": 0.023,
    "accuracy": 0.95,
    "precision": 0.93,
    "recall": 0.97,
    "training_time_minutes": 145
}

Data Quality

Record validation results and processing metrics:

{
    "input_rows": 10000,
    "output_rows": 9850,
    "rows_filtered": 150,
    "null_percentage": 0.02,
    "duplicate_percentage": 0.01,
    "quality_score": 0.985,
    "validation_passed": true,
    "processing_duration_seconds": 45
}

Production Context

Capture environmental and operational data:

{
    "factory_id": "eu-02",
    "production_line": "A",
    "batch_number": "2024-Q1-001",
    "operator_id": "OP-123",
    "temperature_celsius": 22.5,
    "humidity_percentage": 45,
    "timestamp": "2024-01-15T10:30:00Z",
    "quality_check_passed": true
}

How to Add Properties

Properties are added through metadata using any custom JSON keys (except reserved valohai.tags ,valohai.alias , valohai.model-versions and valohai.dataset-versions )

During Execution

import json

# Your custom properties (any JSON structure)
metadata = {
    "accuracy": 0.95,
    "precision": 0.93,
    "recall": 0.97,
    "epochs": 100,
    "learning_rate": 0.001,
    "hyperparameters": {
        "dropout": 0.5,
        "optimizer": "adam",
        "batch_size": 32
    },
    "training_history": [
        {"epoch": 1, "loss": 0.5},
        {"epoch": 2, "loss": 0.3}
    ]
}

# Save your output file
save_path = '/valohai/outputs/model.pkl'
model.save(save_path)

# Save metadata with properties
metadata_path = '/valohai/outputs/model.pkl.metadata.json'
with open(metadata_path, 'w') as f:
    json.dump(metadata, f)

Combine with Tags and Aliases

import json

metadata = {
    # Reserved Valohai keys
    "valohai.tags": ["validated", "production"],
    "valohai.alias": "model-prod",
    "valohai.dataset-versions": ["dataset://big-data/latest"],
    
    # Your custom properties
    "accuracy": 0.95,
    "dataset_version": "v2.3",
    "training_duration_minutes": 145,
    "gpu_type": "V100",
    "experiment_notes": "Increased batch size for better convergence"
}

save_path = '/valohai/outputs/model.pkl'
model.save(save_path)

metadata_path = '/valohai/outputs/model.pkl.metadata.json'
with open(metadata_path, 'w') as f:
    json.dump(metadata, f)

For Multiple Files

Use valohai.metadata.jsonl for many files:

import json

# Save all output files
for i in range(100):
    image.save(f'/valohai/outputs/image_{i:03d}.jpg')

# Add properties to all files
metadata_path = '/valohai/outputs/valohai.metadata.jsonl'
with open(metadata_path, 'w') as f:
    for i in range(100):
        entry = {
            "file": f"image_{i:03d}.jpg",
            "metadata": {
                "quality_score": quality_scores[i],
                "processing_time_ms": processing_times[i],
                "resolution": "1920x1080",
                "format": "JPEG",
                "compression": 85
            }
        }
        json.dump(entry, f)
        f.write('\n')

💡 For complete details on metadata methods, see Add Context to Your Data Files


Read Properties in Code

Access metadata from input files during execution to make data-driven decisions.

Access Input Metadata

import json

# Load input configuration
with open('/valohai/config/inputs.json') as f:
    vh_inputs_config = json.load(f)

# Access properties from input named "model"
for file_data in vh_inputs_config['model']['files']:
    metadata = file_data['metadata']
    
    # Use properties for conditional logic
    if metadata.get('validation_score', 0) > 0.9:
        print(f"High quality model: {file_data['name']}")
        model_path = file_data['path']
    
    # Log metadata for tracking
    print(f"Model trained with {metadata.get('epochs')} epochs")
    print(f"Accuracy: {metadata.get('accuracy')}")

Use Cases for Reading Properties

Filter inputs by quality:

# Only process high-quality data
high_quality_files = [
    f for f in vh_inputs_config['dataset']['files']
    if f['metadata'].get('quality_score', 0) > 0.95
]

Conditional processing:

# Use different logic based on data source
if metadata.get('factory_id') == 'eu-02':
    apply_eu_preprocessing()
else:
    apply_us_preprocessing()

Audit trails:

# Log processing context
logging.info(f"Processing batch {metadata.get('batch_number')}")
logging.info(f"Source: {metadata.get('production_line')}")
logging.info(f"Quality: {metadata.get('quality_check_passed')}")

Add Properties via API

Add or update properties after execution completes using the Valohai API.

Single Datum

Apply properties to one file:

import os
import requests

properties = {
    "validation_score": 0.98,
    "approved_by": "data-team",
    "approval_date": "2024-01-15",
    "notes": "Passed all quality checks"
}

datum_id = "01234567-89ab-cdef-0123-456789abcdef"

response = requests.post(
    f'https://app.valohai.com/api/v0/data/{datum_id}/metadata/',
    json=properties,
    headers={
        'Authorization': 'Token ' + os.getenv('VH_TOKEN'),
        'Content-Type': 'application/json'
    }
)
print(f"Status: {response.status_code}")

Multiple Datums (Different Properties)

Apply different properties to each file:

import os
import requests

payload = {
    "datum_metadata": {
        "datum-id-1": {
            "quality_score": 0.95,
            "validation_status": "passed",
            "reviewer": "alice"
        },
        "datum-id-2": {
            "quality_score": 0.87,
            "validation_status": "passed",
            "reviewer": "bob"
        },
        "datum-id-3": {
            "quality_score": 0.65,
            "validation_status": "failed",
            "reviewer": "alice",
            "failure_reason": "low accuracy"
        }
    }
}

response = requests.post(
    'https://app.valohai.com/api/v0/data/metadata/apply/',
    json=payload,
    headers={
        'Authorization': 'Token ' + os.getenv('VH_TOKEN'),
        'Content-Type': 'application/json'
    }
)
print(f"Status: {response.status_code}")

Multiple Datums (Same Properties)

Apply the same properties to all files:

import os
import requests

payload = {
    "metadata": {
        "processing_version": "v2.1",
        "validated": True,
        "validation_date": "2024-01-15",
        "validator": "automated-pipeline"
    },
    "datum_ids": [
        "datum-id-1",
        "datum-id-2",
        "datum-id-3"
    ]
}

response = requests.post(
    'https://app.valohai.com/api/v0/data/metadata/apply-all/',
    json=payload,
    headers={
        'Authorization': 'Token ' + os.getenv('VH_TOKEN'),
        'Content-Type': 'application/json'
    }
)
print(f"Status: {response.status_code}")

💡 API Token: Get your API token from your Valohai account settings. See Make calls to the Valohai API for details.


Update or Remove Properties

Set property values to None to remove them:

import os
import requests

properties = {
    "old_metric": None,  # Remove this property
    "deprecated_field": None,  # Remove this property
    "new_metric": 0.92,  # Add or update
    "validation_status": "re-approved"  # Update
}

datum_id = "01234567-89ab-cdef-0123-456789abcdef"

response = requests.post(
    f'https://app.valohai.com/api/v0/data/{datum_id}/metadata/',
    json=properties,
    headers={
        'Authorization': 'Token ' + os.getenv('VH_TOKEN'),
        'Content-Type': 'application/json'
    }
)

View Properties

Web Application

  1. Navigate to your project's Data tab

  2. Click on any file to open details

  3. Scroll to the Properties section

  4. Search or filter properties by key

  5. Hover over values to see full content


Common Property Patterns

Experiment Metadata

metadata = {
    # Model architecture
    "model_type": "resnet50",
    "framework": "pytorch",
    "framework_version": "2.0.1",
    
    # Hyperparameters
    "optimizer": "adam",
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 100,
    "dropout": 0.5,
    
    # Results
    "final_loss": 0.023,
    "accuracy": 0.95,
    "precision": 0.93,
    "recall": 0.97,
    "f1_score": 0.95,
    
    # Resources
    "training_time_minutes": 145,
    "gpu_type": "V100",
    "num_gpus": 4,
    
    # Context
    "experiment_id": "exp-042",
    "researcher": "alice",
    "notes": "Best performing model so far"
}

Data Quality Metadata

metadata = {
    # Volume
    "input_rows": 10000,
    "output_rows": 9850,
    "rows_filtered": 150,
    "rows_deduplicated": 100,
    
    # Quality metrics
    "null_percentage": 0.02,
    "duplicate_percentage": 0.01,
    "outlier_percentage": 0.005,
    "quality_score": 0.985,
    
    # Validation
    "validation_passed": True,
    "validation_checks": [
        "schema_valid",
        "no_nulls_in_key_columns",
        "date_range_valid"
    ],
    
    # Processing
    "processing_duration_seconds": 45,
    "data_version": "v2.3",
    "processing_date": "2024-01-15"
}

Production Metadata

metadata = {
    # Source
    "factory_id": "eu-02",
    "production_line": "A",
    "batch_number": "2024-Q1-001",
    "operator_id": "OP-123",
    
    # Conditions
    "temperature_celsius": 22.5,
    "humidity_percentage": 45,
    "pressure_mbar": 1013,
    
    # Quality
    "quality_check_passed": True,
    "defect_rate": 0.002,
    "inspector": "QC-456",
    
    # Timestamps
    "production_start": "2024-01-15T08:00:00Z",
    "production_end": "2024-01-15T16:00:00Z",
    "inspection_time": "2024-01-15T16:30:00Z"
}

Best Practices

Use Consistent Keys

# Good: Consistent naming
"learning_rate"
"batch_size"
"accuracy"

# Avoid: Inconsistent naming
"learningRate"
"batch-size"
"Accuracy"

Structure Nested Data

# Good: Organized structure
{
    "hyperparameters": {
        "learning_rate": 0.001,
        "batch_size": 32,
        "optimizer": "adam"
    },
    "metrics": {
        "accuracy": 0.95,
        "precision": 0.93,
        "recall": 0.97
    }
}

# Avoid: Flat and unclear
{
    "hp_lr": 0.001,
    "hp_bs": 32,
    "m_acc": 0.95,
    "m_prec": 0.93
}

Include Units

# Good: Clear units
{"training_time_minutes": 145}
{"temperature_celsius": 22.5}
{"file_size_mb": 234.5}

# Avoid: Ambiguous
{"training_time": 145}  # seconds? minutes? hours?
{"temperature": 22.5}  # celsius? fahrenheit?


Next Steps

  • Learn how to use tags alongside properties

  • Create aliases pointing to files with rich metadata

  • Set up datasets to group files by properties

Last updated

Was this helpful?