Productivity Dashboard

Why this matters

Most ML teams struggle to answer a deceptively simple question: "What's the ROI of our MLOps platform?"

You can stitch together custom dashboards from experiment trackers, cloud cost analyzers, and orchestration logs. But when your CFO asks for numbers, you'll spend days writing scripts to correlate metrics that don't tell a coherent story about platform value.

The Productivity Dashboard solves this by automatically capturing and visualizing the metrics that demonstrate why unified MLOps delivers measurably better outcomes than fragmented toolchains.

What you get

The dashboard shows platform value across four key dimensions:

Quantified platform ROI: Hard numbers on cost and time savings from intelligent automation.
Accelerated innovation cycles: Track how quickly projects go from kickoff to first approved model.
Operational excellence: Pipeline success rates and infrastructure efficiency demonstrate professional ML operations.
Governance and risk mitigation: Data provenance and model reproducibility rates prove you can meet compliance requirements.

Accessing the dashboard

Organization administrators can access the Productivity Dashboard from the organization menu.

The dashboard displays data for the last 30 days by default. Use the date range selector in the top-right corner to analyze different time periods.

Key metrics

Time to Value (TTV)

Shows the average number of days from project kickoff to the first completed job and approved model version.

Lower TTV means faster development cycles and competitive advantage. This metric demonstrates how the platform accelerates getting ML solutions to market compared to fragmented toolchains.

The trend indicator shows whether your TTV is improving or degrading during the selected period.

💡 Learn more about using models in Valohai.

Job Reuse Savings

Displays the financial and time benefits of reusing matching configuration nodes from past executions.

This is value that only emerges from having a unified system that can intelligently match and reuse configurations. Instead of saying "the platform helps with efficiency," you can show exactly how much money and compute time you're saving.

Cost Savings: Total amount of money saved by reusing jobs instead of running new ones.
Time Savings: Total amount of time saved by reusing jobs.
Trend indicators show how both metrics changed during the selected period.

💡 Learn more about reusing nodes in Valohai.

Automated Pipeline Success Rate

Shows the percentage of automated pipelines that completed successfully.

Automated pipelines are those triggered via scheduled task or webhook. Higher success rates indicate greater operational reliability and demonstrate systematic risk management that protects project timelines and budgets.

The trend indicator reveals whether your success rate is improving during the selected period.

Detailed analytics

Project Cost Breakdown

Bar chart showing total compute cost for the five most expensive projects, listed in descending order.

Use this to understand resource allocation and identify potential areas for cost optimization.

Average GPU Utilization

Gauge: Shows average GPU utilization across all jobs. Higher utilization indicates better resource efficiency.
Counter: Number of GPU utilization alerts triggered during the selected period. High alert counts may indicate underutilized resources or potential bottlenecks.

Total Run Jobs

Chart displaying the total number of jobs run for the five largest projects.

Provides a clear view of workload distribution across your organization.

Job Reuse Rate

Shows the percentage of jobs that were reused for the five largest projects.

Higher reuse rates indicate greater efficiency by reducing the need to run new jobs for repeated tasks. This metric demonstrates one of the key advantages of using a unified MLOps platform.

Peak Waiting Time (per environment)

Chart showing the maximum time a job spent waiting in the queue before starting, broken down by the five slowest environments.

Helps identify potential bottlenecks in specific environments. Shorter wait times indicate more responsive infrastructure.

Pipeline Status Overview

Breakdown of all pipeline statuses during the selected period:

Completed: Successfully finished pipelines.
Stopped: Manually stopped by a user.
Error: Pipelines that encountered errors and failed to complete.

The chart displays both percentages and total counts for each status.

Most Used Datasets

Shows the five most frequently used datasets in your projects.

Helps identify the most valuable and critical datasets in your organization.

Data Provenance Tracking

Percentage of datasets with complete data lineage tracking.

Tracked: Files used by executions as inputs or outputs with complete lineage information.
Untracked: Loose files not being used in executions.

Higher tracking coverage is crucial for data governance and reproducibility.

Model Reproducibility

Percentage of models that are fully reproducible.

Reproducible models can be recreated from their original code, data, and environment. Models imported and trained outside Valohai are not considered reproducible.

High reproducibility rates are essential for ensuring reliability and auditability of your models, especially for compliance and regulatory requirements.

What this tells stakeholders

The visual nature of these metrics makes them perfect for:

Executive presentations: Demonstrate platform ROI with concrete numbers on cost savings and time-to-value improvements.
Budget justifications: Show exactly where compute budget is being spent and how efficiency gains offset platform costs.
Cross-functional discussions: Provide shared visibility into ML operations performance for engineering, finance, and compliance teams.
Compliance reviews: Prove governance maturity through data provenance tracking and model reproducibility rates.

No configuration required

The Productivity Dashboard works out of the box with your existing Valohai setup. No additional configuration or instrumentation needed.

All metrics are automatically collected as part of normal platform operation.

PreviousObservability & Analytics NextAudit Log

Last updated 1 month ago

Was this helpful?