ROC & Confusion Matrix

Manual: set TPR, FPR, and prevalence directly.
Quick starts
Custom data Built-in
i
Expected file shape
Use one row per example. Required fields are label and score. label must be 0 or 1. score should be between 0 and 1. JSON also accepts aliases like y_true/target and prob/pred.
CSV
label,score
1,0.97
0,0.32
1,0.88
JSON
[
  {"label": 1, "score": 0.97},
  {"label": 0, "score": 0.32}
]
Using built-in simulator.
TPR 85%
FPR 20%
Prevalence 30%
Score Model Gaussian curves affect distributions & AUC in Manual
Manual Gaussian
Threshold
μ Neg 0.0
σ Neg 1.0
μ Pos 2.0
σ Pos 1.0
N 1000
Confusion Matrix
Positive
Negative
Predicted
TP
FP
FN
TN
Actual
Metrics
Hover a metric card, or tap it on iPhone, to highlight the matching pieces in the mosaic and confusion matrix. Tap the same card again to clear.
TPR(Recall)
TP/(TP+FN)
FPR
FP/(FP+TN)
Precision(PPV)
TP/(TP+FP)
Accuracy
(TP+TN)/N
F1 Score
2·P·R/(P+R)
AUC
Area under ROC
NPV
TN/(TN+FN)
MCC
Matthews corr.
Click any metric card in Values to jump here. These are the standard formulas written in the usual notation.
TPR (Recall)
Traditional Formula
TPR = TP / (TP + FN)
Also written as: Recall = TP / (TP + FN)
Among the actual positives, what fraction did the model catch?
FPR
Traditional Formula
FPR = FP / (FP + TN)
Also written as: FPR = 1 - TNR
Among the actual negatives, what fraction were wrongly predicted positive?
Precision (PPV)
Traditional Formula
Precision = TP / (TP + FP)
Also written as: PPV = TP / (TP + FP)
Among the predicted positives, what fraction are truly positive?
Accuracy
Traditional Formula
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Also written as: Accuracy = (TP + TN) / N
Out of all predictions, what fraction were correct?
F1 Score
Traditional Formula
F1 = 2TP / (2TP + FP + FN)
Also written as: F1 = 2 · Precision · Recall / (Precision + Recall)
A balanced score that gets high only when both precision and recall are high.
AUC
Traditional Formula
AUC = ∫01 TPR(FPR) d(FPR)
Also written as: area under the ROC curve
It summarizes performance across all thresholds, not just the current one.
NPV
Traditional Formula
NPV = TN / (TN + FN)
NPV = negative predictive value
Among the predicted negatives, what fraction are truly negative?
MCC
Traditional Formula
MCC = (TP · TN - FP · FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN))
Range: -1 to 1
Uses all four confusion-matrix cells, which makes it robust under class imbalance.
Live Precision Story
Prevalence changes the group sizes, not the rates
PPV = TPR · Prev / (TPR · Prev + FPR · (1−Prev))
TPR and FPR are rates inside each class. Changing prevalence does not change those rates. It changes how many people those rates are applied to.
Reference cohort of 10,000 — the mosaic above uses your N slider
1

Out of 10,000 people, split them into the actually positive group and the actually negative group.

2

TPR is measured only within the positive group — it is set by the model, not by how large that group is.

3

FPR is the share of actually negative cases wrongly flagged. FPR is measured only within the negative group — it also does not shift when prevalence changes.

4

Precision = TP / (TP + FP) — dividing true positives by all predicted positives.

What changes is precision, because the same rates are applied to bigger or smaller groups.
Bayes' Theorem
P(D|+) = P(+|D) · P(D) / P(+)
The foundation of probabilistic classification. Given a positive test result (+), what is the probability the patient actually has the disease (D)? The answer depends on three things: how good the test is at catching true cases (likelihood), how common the disease is (prior), and how often the test fires overall (evidence). This is why a 99% accurate test can still give mostly false positives for a rare disease.
NPV via Bayes
NPV = TNR · (1−Prev) / (TNR · (1−Prev) + FNR · Prev)
The mirror of PPV: "If the model says negative, how likely is that correct?" When prevalence is low, most negative predictions are correct (high NPV) because negatives dominate the population. As prevalence increases, the fraction of missed positives (FN) grows relative to true negatives, and NPV drops. NPV and PPV move in opposite directions as you change prevalence, which is a key Bayesian insight.
Sensitivity (Recall / TPR)
TPR = P(+|D) = TP / (TP + FN)
Of all the actual positives, how many did the model catch? A sensitivity of 0.95 means 95% of true cases are detected, but 5% are missed (false negatives). Critical in medical screening: missing a cancer diagnosis (FN) is usually worse than a false alarm (FP). Sensitivity is intrinsic to the model and threshold, not affected by prevalence.
Specificity (TNR)
TNR = P(−|D̄) = TN / (TN + FP)
Of all the actual negatives, how many did the model correctly identify? Specificity of 0.90 means 10% of healthy patients receive a false positive. Like sensitivity, specificity is independent of prevalence. High specificity is important when false positives are costly, such as unnecessary surgeries or wrongful convictions.
Accuracy
Acc = (TP + TN) / N
The most intuitive metric: what fraction of all predictions were correct? However, accuracy is misleading with imbalanced classes. A model that always predicts "negative" on a dataset with 95% negatives achieves 95% accuracy while catching zero true positives. Compare it with the Prevalence slider at extreme values to see this paradox in action.
F1 Score
F1 = 2 · PPV · TPR / (PPV + TPR)
The harmonic mean of precision and recall. The harmonic mean punishes extreme imbalance: if either precision or recall is near zero, F1 collapses. An F1 of 0.8 guarantees that both precision and recall are at least 0.67. It is prevalence-dependent (through precision) and is most useful when you care equally about false positives and false negatives.
AUC (Area Under ROC Curve)
AUC = integral of TPR d(FPR) over [0,1]
AUC measures the model's ability to discriminate between classes across all possible thresholds. It equals the probability that a randomly chosen positive scores higher than a randomly chosen negative. AUC = 0.5 means random guessing (diagonal ROC), AUC = 1.0 means perfect separation. AUC is threshold-independent and prevalence-independent, making it ideal for comparing models.
MCC (Matthews Correlation Coefficient)
MCC = (TP·TN − FP·FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN))
MCC is a correlation coefficient between the observed and predicted binary classifications, ranging from −1 (total disagreement) through 0 (random) to +1 (perfect). Unlike F1 and accuracy, MCC uses all four confusion matrix quadrants and remains reliable even with highly imbalanced datasets.
Likelihood Ratios
LR+ = TPR / FPR   |   LR− = FNR / TNR
Likelihood ratios express how much a test result shifts the odds. LR+ tells you how much more likely a positive result is in a true positive vs. a false positive. LR+ > 10 is considered strong diagnostic evidence. LR− < 0.1 is strong evidence for ruling out. Unlike PPV/NPV, likelihood ratios are independent of prevalence.
Class Distributions & Threshold
ROC Curve