Visualize Utilization
Hardware utilization is automatically visualized on the execution page, letting you monitor CPU, memory, and GPU usage in real-time.
Understanding the summary view
The collapsed view shows a quick overview of peak resource utilization.

Reading the colored bars
Four colored bars represent utilization ranges for each resource:
Black line: Peak utilization of the resource.
Green (1–70%): Low to moderate utilization.
Orange (70–90%): High utilization.
Red (90–100%): Maximum utilization.
💡 For GPU-intensive workloads, you want to see red—that means you're using the full capacity you're paying for.
Detailed charts
Click the ▼ to expand the full hardware statistics view.

Chart layout
Left chart: Computational resources (CPU and GPU processing power).
Right chart: Memory resources (system memory and GPU memory).
Understanding the visualizations
Gray areas: Min-max range of values over each 2-minute interval.
Colored lines: Average values over each interval.
Hover over the lines to see exact values at specific timestamps.
Click legend items to hide the corresponding metric for clearer comparison.
Chart actions
Navigate: Use chart controls to zoom in on specific time periods.
Download: Export statistics as a JSON file for further analysis.
API access: Statistics are also available through the Valohai API for programmatic analysis.
What good utilization looks like
CPU-intensive training
Expect CPU usage in the red zone (90–100%) during data preprocessing or single-threaded training.
GPU-intensive training
Look for GPU processor usage consistently in the red zone. GPU memory should be high but may not hit 100% depending on batch size.
Data loading bottlenecks
If GPU utilization drops while CPU usage spikes, you may have a data loading bottleneck. Consider increasing data loader workers or using faster storage.
Memory leaks
If memory usage steadily climbs without plateauing, you may have a memory leak in your training code.
What's next
Track underutilization to automatically identify executions with low resource usage and get recommendations for cheaper instances.
Hardware statistics overview to understand what metrics Valohai collects and how they're stored.
Last updated
Was this helpful?
