Collect Metrics from Output Files (YOLO & Others)
Some frameworks like YOLOv8 write metrics to output files instead of printing them. Use a file watcher to monitor these files and stream their contents to Valohai metadata automatically.
This pattern works for any framework that writes metrics to CSV, JSON, or text files during training.
The Problem
YOLOv8 (and similar frameworks) write training metrics to CSV files in the outputs directory:
runs/train/exp/
├── results.csv # Training metrics per epoch
├── weights/
│ ├── best.pt # Best model
│ └── last.pt # Latest model
└── ...Challenge: These metrics aren't printed as JSON, so Valohai doesn't capture them automatically.
Solution: Use a file watcher script that monitors output files and prints their contents as JSON.
Quick Example
valohai.yaml
- step:
name: train-yolov8
image: ultralytics/yolov8:latest
command:
- git clone https://github.com/ultralytics/yolov8.git
- tar -xf /valohai/inputs/dataset/coco128.tar
- pip install watchdog
- nohup python ./scripts/valohai_watch.py & # Start watcher in background
- python yolov8/train.py --data coco128.yaml --epochs {parameters}
inputs:
- name: dataset
default: https://github.com/ultralytics/yolov8/releases/download/v1.0/coco128.tar.xz
parameters:
- name: epochs
type: integer
default: 10
environment: aws-eu-west-1-g4dn-xlargescripts/valohai_watch.py
How It Works
Start watcher in background:
nohup python valohai_watch.py &runs the watcher script as a background processMonitor output directory: The watcher uses
watchdogto detect file changes in/valohai/outputs/Parse and log: When a CSV is modified, the watcher reads the latest row and prints it as JSON
Valohai captures: Valohai sees the printed JSON and records it as metadata
Complete Working Example
Here's a full implementation with model aliasing:
scripts/valohai_watch.py (Complete)
Adapting for Other File Formats
JSON Files
Text Files with Key-Value Pairs
TensorBoard Event Files
For TensorBoard logs, use tensorboard library:
Best Practices
Start Watcher Before Training
Always start the watcher before your training script:
Use nohup for Background Execution
nohup for Background Executionnohup ensures the watcher keeps running even if the parent process terminates:
Handle Partial Writes
Files might be written incrementally. Add a small delay:
Filter by Filename Pattern
Only watch specific files to avoid unnecessary processing:
Error Handling
Always wrap file operations in try-except:
Common Issues
Watcher Not Starting
Symptom: No metrics logged, watcher script never runs
Causes & Fixes:
Missing dependency → Install
watchdog:pip install watchdogScript not in correct location → Check path in command
Background process killed → Use
nohupand&
Debug:
Metrics Logged Multiple Times
Symptom: Same epoch metrics appear repeatedly
Cause: CSV file modified multiple times per epoch
Solution: Track last processed row:
File Not Found Errors
Symptom: Watcher crashes when trying to read files
Cause: File deleted or moved before watcher can read it
Solution: Check file exists before reading:
When to Use This Pattern
Use file watchers when:
Framework writes metrics to files (YOLOv8, MMDetection, etc.)
You can't modify the framework's code
Metrics are in CSV, JSON, or structured text
Don't use file watchers when:
You can modify your training code (use direct JSON printing instead)
Framework has callback/hook system (use callbacks)
Metrics are printed to stdout (already captured by Valohai)
Example Project
Check out our complete working example on GitHub:
The repository includes:
Complete watcher script
YOLOv5 and YOLOv8 training configuration
valohai.yamlwith proper setupStep-by-step instructions
Next Steps
Visualize your metrics in Valohai
Compare experiments across different runs
Learn about model aliasing for production deployment
Back to Collect Metrics overview
Last updated
Was this helpful?
