Visualize Utilization

Hardware utilization is automatically visualized on the execution page, letting you monitor CPU, memory, and GPU usage in real-time.

Understanding the summary view

The collapsed view shows a quick overview of peak resource utilization.

Resource utilization summary as displayed on the execution page. Click the ▼ to expand.

Reading the colored bars

Four colored bars represent utilization ranges for each resource:

  • Black line: Peak utilization of the resource.

  • Green (1–70%): Low to moderate utilization.

  • Orange (70–90%): High utilization.

  • Red (90–100%): Maximum utilization.

💡 For GPU-intensive workloads, you want to see red—that means you're using the full capacity you're paying for.

Detailed charts

Click the ▼ to expand the full hardware statistics view.

Expanded hardware statistics

Chart layout

  • Left chart: Computational resources (CPU and GPU processing power).

  • Right chart: Memory resources (system memory and GPU memory).

Understanding the visualizations

  • Gray areas: Min-max range of values over each 2-minute interval.

  • Colored lines: Average values over each interval.

Hover over the lines to see exact values at specific timestamps.

Click legend items to hide the corresponding metric for clearer comparison.

Chart actions

  • Navigate: Use chart controls to zoom in on specific time periods.

  • Download: Export statistics as a JSON file for further analysis.

  • API access: Statistics are also available through the Valohai API for programmatic analysis.

What good utilization looks like

CPU-intensive training

Expect CPU usage in the red zone (90–100%) during data preprocessing or single-threaded training.

GPU-intensive training

Look for GPU processor usage consistently in the red zone. GPU memory should be high but may not hit 100% depending on batch size.

Data loading bottlenecks

If GPU utilization drops while CPU usage spikes, you may have a data loading bottleneck. Consider increasing data loader workers or using faster storage.

Memory leaks

If memory usage steadily climbs without plateauing, you may have a memory leak in your training code.

What's next

Track underutilization to automatically identify executions with low resource usage and get recommendations for cheaper instances.

Hardware statistics overview to understand what metrics Valohai collects and how they're stored.

Last updated

Was this helpful?