Confusion Matrix
Confusion matrices show how your classifier performs across all classes. See where your model excels, where it struggles, and which classes get confused with each other.
Quick Start
1. Log Confusion Matrix Data
Print your confusion matrix as JSON with a specific format:
from sklearn.metrics import confusion_matrix
import json
# Get predictions
y_pred = model.predict(X_test)
# Compute confusion matrix
matrix = confusion_matrix(y_true, y_pred)
# Log to Valohai
print(json.dumps({
"data": matrix.tolist()
}))Required format: {"data": [[row1], [row2], ...]}
Add labels
If you add a list of string as the first item in the data, those will be used as labels. Make sure the number of labels matches the number of rows / items per row.
print(json.dumps({
"data": [["y_true", "y_pred"], [50, 2], [3, 45]]
}))2. View in Metadata Tab
Open your execution
Click the Metadata tab
Click the visualization dropdown (shows "Time Series" by default)
Select Confusion Matrix
The confusion matrix visualization appears automatically.

Complete Example
Binary Classification
from sklearn.metrics import confusion_matrix
import numpy as np
import json
# True labels and predictions
y_true = [0, 1, 0, 1, 0, 1, 1, 0, 1, 0]
y_pred = [0, 1, 0, 0, 0, 1, 1, 1, 1, 0]
# Compute confusion matrix
matrix = confusion_matrix(y_true, y_pred)
# Log to Valohai
print(json.dumps({
"data": matrix.tolist()
}))
# Output: {"data": [[4, 1], [1, 4]]}Interpretation:
Top-left (4): True negatives
Top-right (1): False positives
Bottom-left (1): False negatives
Bottom-right (4): True positives
Multi-Class Classification
from sklearn.metrics import confusion_matrix
import json
# Example: 3-class problem
y_true = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]
# Compute confusion matrix
matrix = confusion_matrix(y_true, y_pred)
# Convert to list and log
result = matrix.tolist()
print(json.dumps({
"data": result
}))
# Output: {"data": [[3, 0, 0], [0, 1, 2], [2, 1, 3]]}Interpretation:
Rows represent true labels
Columns represent predicted labels
Diagonal shows correct predictions
Off-diagonal shows misclassifications
With PyTorch
import torch
from sklearn.metrics import confusion_matrix
import json
# After training/validation
model.eval()
all_preds = []
all_labels = []
with torch.no_grad():
for data, labels in test_loader:
outputs = model(data)
_, predicted = torch.max(outputs, 1)
all_preds.extend(predicted.cpu().numpy())
all_labels.extend(labels.cpu().numpy())
# Compute confusion matrix
matrix = confusion_matrix(all_labels, all_preds)
# Log to Valohai
print(json.dumps({
"data": matrix.tolist()
}))With TensorFlow/Keras
import tensorflow as tf
from sklearn.metrics import confusion_matrix
import json
# Get predictions
y_pred = model.predict(X_test)
y_pred_classes = tf.argmax(y_pred, axis=1).numpy()
y_true_classes = tf.argmax(y_test, axis=1).numpy()
# Compute confusion matrix
matrix = confusion_matrix(y_true_classes, y_pred_classes)
# Log to Valohai
print(json.dumps({
"data": matrix.tolist()
}))Troubleshooting
Confusion Matrix Not Appearing
Symptom: Only time series graphs appear, no confusion matrix option
Causes & Fixes:
Wrong JSON format:
# Wrong: Missing "data" key
print(json.dumps(matrix.tolist()))
# Correct: Must have "data" key
print(json.dumps({"data": matrix.tolist()}))Matrix not converted to list:
# Wrong: NumPy array not JSON serializable
print(json.dumps({"data": matrix}))
# Correct: Convert to list
print(json.dumps({"data": matrix.tolist()}))Matrix Shows Wrong Values
Symptom: Numbers don't match expected confusion matrix
Cause: Class labels not aligned
Solution: Ensure true and predicted labels use the same encoding:
# If using one-hot encoded labels
y_true_classes = np.argmax(y_true, axis=1)
y_pred_classes = np.argmax(y_pred, axis=1)
matrix = confusion_matrix(y_true_classes, y_pred_classes)Matrix Shape Changes Between Epochs
Symptom: Early epochs have smaller matrices
Cause: Not all classes predicted yet
Solution: Specify all class labels explicitly:
matrix = confusion_matrix(
y_true,
y_pred,
labels=list(range(num_classes)) # [0, 1, 2, ..., num_classes-1]
)Next Steps
Visualize time series metrics alongside confusion matrices
Compare executions to see how different models handle class confusion
Compare output images for misclassified examples
Back to Visualize Metrics overview
Last updated
Was this helpful?
