Evaluating a true classifier e.g., pregnancy test

Question

Most alleged "classifiers" give probabilities of class membership. One can use a threshold to map those probabilities to discrete categories, but statisticians are in favor of direct evaluation of the predicted probabilities. This eliminates issues like class imbalance being problematic, because the predicted probability of a rare class probably should be low.

But some classifiers really do classify, and I am inspired to ask about their evaluation after walking past a pregnancy test last night when I was in the pharmacy. Such a device does not give a probability, just a certain number of stripes that alert the user to the category of pregnant or not pregnant.

In this case, log-loss results in a calculation involving $\log(0)$, and while we could define $0\log(0) = 0$, that feels slightly like cheating. Brier score is equivalent to accuracy, but accuracy has issues when classes are imbalanced, such as tricking us into thinking that our $99\%$ accuracy is like an A+ in school, yet random guessing based on the class proportions would result in $99.9\%$ accuracy.

What would be the right way to analyze such a classifier?

I don't understand the relevance or meaning of most of this post. In what sense are these pregnancy tests "true classifiers"? Where does a class imbalance problem arise in their application? What does the test label claim about what the number of stripes mean? In what sense does any other binary or small ordinal test "give a probability"? — whuber, Jul 21 '21 at 19:26

Evaluating a true classifier e.g., pregnancy test

0 Answers0