2

A long time ago I seem to remember learning that rather than just assigning a 0 or 1 accuracy values for samples, one could calculate the weighted accuracy of a sample based on the confidence level of a classifier. For example, if p(Y=1|data) = 0.8 and Y=1, then the sample accuracy would be considered to be 0.8, rather than 1. Similarly if Y != 1, then the output is whatever the predicted accuracy of the actual class membership is. So for binary classification, the output would be 0.2.

This approach is then useful for comparing classifiers because the final accuracy increases as confidence of a classifier increases.

My question is, despite being relatively sure that this technique is a thing, I can't find any discussion of it online. Is there a name for this? Weighted accuracy? class-probability accuracy? Jim's old-fashioned log-accuracy? I was hoping to find wording that was a bit more formal than what I had described, as well as any extended discussion relating to use of this approach.

Jimbo
  • 133
  • 5

2 Answers2

1

You may be thinking of the log-loss, which is the default loss function in probability models:

$$ L(y, p) = \sum_i y_i \log(p_i) + (1 - y_i) \log(1 - p_i) $$

This loss function measures how well the predicted probabilities are in concordance with the true labels, but with a severe penalty for being very wrong, as $\log(p) \rightarrow \infty$ as $p \rightarrow 0$, and $\log(1 - p) \rightarrow \infty$ as $p \rightarrow 1$.

There is indeed a more formal justification for this choice of loss function, it is derived by applying the principle of maximum likelihood to probability models.

Matthew Drury
  • 33,314
  • 2
  • 101
  • 132
1

What you call "the confidence of a classifier" is more commonly called a "probabilistic prediction or classification".

Your mapping then takes these probabilistic predictions and the corresponding actuals to a loss function. Loss functions of this kind are called . More information and further reading on scoring rules can be found in the tag wiki and yes, they are a much better way of evaluating a classifier than accuracy. The log loss that Matthew Drury discusses is one example of such a scoring rule.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357