Confusion Matrix
Confusion matrices show how your classifier performs across all classes. See where your model excels, where it struggles, and which classes get confused with each other.
Quick Start
1. Log Confusion Matrix Data
Print your confusion matrix as JSON with a specific format:
from sklearn.metrics import confusion_matrix
import json
# Get predictions
y_pred = model.predict(X_test)
# Compute confusion matrix
matrix = confusion_matrix(y_true, y_pred)
# Log to Valohai
print(
json.dumps(
{
"data": matrix.tolist(),
},
),
)Required format: {"data": [[row1], [row2], ...]}
Add labels
If you add a list of string as the first item in the data, those will be used as labels. Make sure the number of labels matches the number of rows / items per row.
2. View in Metadata Tab
Open your execution
Click the Metadata tab
Click the visualization dropdown (shows "Time Series" by default)
Select Confusion Matrix
The confusion matrix visualization appears automatically.

Complete Example
Binary Classification
Interpretation:
Top-left (4): True negatives
Top-right (1): False positives
Bottom-left (1): False negatives
Bottom-right (4): True positives
Multi-Class Classification
Interpretation:
Rows represent true labels
Columns represent predicted labels
Diagonal shows correct predictions
Off-diagonal shows misclassifications
With PyTorch
With TensorFlow/Keras
Troubleshooting
Confusion Matrix Not Appearing
Symptom: Only time series graphs appear, no confusion matrix option
Causes & Fixes:
Wrong JSON format:
Matrix not converted to list:
Matrix Shows Wrong Values
Symptom: Numbers don't match expected confusion matrix
Cause: Class labels not aligned
Solution: Ensure true and predicted labels use the same encoding:
Matrix Shape Changes Between Epochs
Symptom: Early epochs have smaller matrices
Cause: Not all classes predicted yet
Solution: Specify all class labels explicitly:
Next Steps
Visualize time series metrics alongside confusion matrices
Compare executions to see how different models handle class confusion
Compare output images for misclassified examples
Back to Visualize Metrics overview
Last updated
Was this helpful?
