Why isn't the ROC curve naturally plotted in 3D?

Question

Something that really confuses me with how ROC plots are generated is that, according to Wikipedia:

The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.

So the algorithm seems to be: Set a decision threshold for binary classification, e.g., a number between $[0, 1]$ that tells which data is labeled as $1$ or $0$. Run the trained classifier on your dataset. Calculate TP and FP, which is just based on the result of your classification. Then plot a point with coordinate (TP, FP) on a 2D plot.

Repeat this process for....(?) many many times, until you get many points. Then linearly interpolate between these points to generate a 2D curve.

Do I get this process correct?

So my question is: why isn't this naturally a 3D plot? The threshold parameter seems to be pretty important.

First of all, I cannot tell at any given plot on the ROC curve what the threshold was used. Are the points even plotted according some sorted order of these thresholds? I cannot tell.
Also, shouldn't the threshold be critical at creating your final classifier? The ROC curve should have a point that corresponds to the best threshold. But I don't know what that threshold is on this plot.
Finally, this is not related to the ROC plot per-se but just the entire concept. It seems pretty dumb to me to either use a very large or very small threshold. A decision threshold of $0$ or $1$ would not make any sense in creating a classifier, no?

It just doesn't seem to make sense to neglect this information.

[I had a feeling this came up just a few days ago.](https://stats.stackexchange.com/q/530946/247274) And I agree with the comment to my answer that the drawing gets funky when we have a 2d computer screen. (Having tried a 3D plot, I don’t get any value from it. Perhaps we could color-code the curve to denote the threshold.) // Mods: should this be merged with the linked post from a few days ago? — Dave, Jun 20 '21 at 08:38
Does this answer your question? [Why is the ROC curve two-dimensional instead of three-dimensional?](https://stats.stackexchange.com/questions/530946/why-is-the-roc-curve-two-dimensional-instead-of-three-dimensional) — Calimo, Jun 20 '21 at 08:56

score 6 · Answer 1 · answered Jun 20 '21 at 07:47

First off, you are right that we could in principle plot the ROC curve as a curve in three-dimensional space. One axis would be the threshold, and the other two would be the TPR and the FPR.

As to why this is not done: I don't know who invented the ROC curve and what their thought processes are (that might be interesting to track down), but one thing is that a curve in 3d space is just harder to understand. Especially since we are limited to not plotting a 3d curve, but some projection to 2d on paper or a screen, even if we can use clever tricks like rotating the projection in an animation. Thus, we remove the threshold by only plotting a parameterized curve. Note that the concept also appears elsewhere - for instance, you could plot a trajectory of a particle parameterized by time.

To your three bullet points:

You are right that we don't know which point at the curve belongs to which threshold. Typically, both the TPR and the FPR increase with an increasing threshold, so if point A is to the right and above point B, the threshold is higher at A than at B.
The ROC is an instrument that attempts to evaluate your entire model, not a classifier that includes a threshold. Setting the threshold is not part of the modeling step (where we want to get a handle on class membership probabilities), but on the decision step (where we want to make a decision, based on probabilities, but also on costs). Any "optimal" threshold cannot be set based on the statistics alone, but will also require knowledge about costs. Take a look at Is threshold moving unnecessary in balanced classification problem?
Thresholds at zero or one will indeed likely not make any sense. I find it hard to imagine a situation where we would model class memberships, but subsequently decide to treat every instance as if they all belonged to class A or B. However, as per my answer to the question linked above, it can make sense to use very large or very small thresholds, depending on the costs. If you are sitting in the control room of a nuclear reactor, and your gauges give you a very small probability that the reactor could go out of control, then you take action, even if that probability is tiny - simply because the costs of not doing anything are astronomical in the improbable case that something Bad does occur.

I am under the impression that ROC came from British radar operators during World War Two. — Dave, Jun 20 '21 at 08:24

score 1 · Answer 2 · answered Jun 20 '21 at 09:33

Are the points even plotted according some sorted order of these thresholds?

No, there is no guarantee that ordered thresholds be reflected on the curve.

Also, shouldn't the threshold be critical at creating your final classifier? The ROC curve should have a point that corresponds to the best threshold. But I don't know what that threshold is on this plot.

Yes. The best threshold is usually picked based on the costs. But sometimes it is used F1-score maximization, see Thresholding Classifiers to Maximize F1 Score

would not make any sense in creating a classifier, no?

Yes. Usually too low and or too high threshold indicates issues for the classifier.

Please note that ROC is criticized heavily for not being easy to interpret, prone to class imbalance and famously not being a coherent measure, see h-measure. Interpreting the result of ROCAUC should be practiced with strong caution.

for the first part, TPR and FPR are monotone functions of the threshold, so indeed the points on the curve are ordered by threshold. — Ben Reiniger, Jun 20 '21 at 15:15
Agree that thresholds are always ordered monotonically along an ROC curve. The highest possible threshold is always at one end of the curve and the lowest possible threshold at the other, with all other possible thresholds falling in order along the curve. — Nuclear Hoagie, Aug 05 '21 at 16:26

score 1 · Answer 3 · answered Jun 20 '21 at 23:04

The purpose of an ROC curve is to summarize the performance of a classifier. The majority of binary classifiers in common use produce a score which can be trivially mapped to the interval [0, 1] (indeed it is standard to output a score that has been mapped this way). It stands to reason that as you increase sensitivity (less false negatives), you will lose specificity (more false positives) and vice versa. This is the basic trade off of most classification: If you are too strict, you will miss some of the signal, but if you are too lenient you will capture too much noise.

So we use the ROC curve when the precise threshold is irrelevant, but rather we care about the distribution of the classifier's scores for positives and negatives (ideally, they should be different). And in fact, we don't really care about the distribution itself, but rather we care about the basic question: Do these scores distinguish them or not. This is what the ROC measures.

TPR and FPS are the two variables that are important. Hence we plot them on the classic 2D plot. If a reader is happy with the overall ROC of the classifier, but they have a certain TPR or FPR they want, there are many easy ways of estimating a threshold based on the ROC. However, the ROC curve is not plotted to evaluate a given threshold, it is plotted to evaluate a classifier. Thus adding a 3rd dimension as you suggest would not serve the goal of the plot.

3D plots are also very difficult to read, even with an interactive display. A lot of technical communication to this day happens with static documents, such as papers, where 3D plots would be a definite no-no. Even potentially dynamic media, such as websites, are (and should be) wary of displaying unnecessary interactive widgets because users are loathe to deal with them. If you really wanted to represent the threshold, you could simply plot it below the graph as a second panel, aligned to FPR.

Are the points even plotted according to some sorted order of threshold?

Sane classifier training algorithms tend to produce monotonically increasing TPR and FPR wrt threshold.

The ROC curve should show the best threshold.

No, because "best" is subjective. You pick the best threshold according to the relative value you place on false positives vs. false negatives.

For a significant application, you wouldn't just pick a point on the ROC curve to select your threshold. You would do additional simulations to estimate the threshold based on the validation data you have.

It seems pretty dumb to me to either use a very large or very small threshold.

No, it is not dumb at all. Entirely dependent on application. Furthermore, you could trivially apply something like a log or power transform to the score, and convert an "extreme" value to a moderate one or vice versa.

"Sane classifier training algorithms tend to produce monotonically increasing TPR and FPR wrt threshold." I don't think this has anything to do with the algorithm? The numerators are monotone as more samples cross the threshold into positive classification, while the denominators are fixed based on the true labels. — Ben Reiniger, Jun 21 '21 at 01:41

Why isn't the ROC curve naturally plotted in 3D?

3 Answers3