Most alleged "classifiers" give probabilities of class membership. One can use a threshold to map those probabilities to discrete categories, but statisticians are in favor of direct evaluation of the predicted probabilities. This eliminates issues like class imbalance being problematic, because the predicted probability of a rare class probably should be low.
But some classifiers really do classify, and I am inspired to ask about their evaluation after walking past a pregnancy test last night when I was in the pharmacy. Such a device does not give a probability, just a certain number of stripes that alert the user to the category of pregnant or not pregnant.
In this case, log-loss results in a calculation involving $\log(0)$, and while we could define $0\log(0) = 0$, that feels slightly like cheating. Brier score is equivalent to accuracy, but accuracy has issues when classes are imbalanced, such as tricking us into thinking that our $99\%$ accuracy is like an A+ in school, yet random guessing based on the class proportions would result in $99.9\%$ accuracy.
What would be the right way to analyze such a classifier?