How to compare predicted accuracy and actual accuracy?

Question

Consider a classifier that, given an input vector ${\bf x}$ outputs both a prediction $y'$ whose accuracy ($a \in \{0, 1\}$) can be measured, as well as a predicted accuracy that corresponds to the predicted probability ($p \in [0, 1]$) that the prediction is correct. The accuracy is computed w.r.t. to the known target value $y$, i.e., $a = 1$ if $y = y'$ and $0$ otherwise. Ideally, the predicted accuracy represents $p(y = y' | {\bf x})$ with $y' = f({\bf x})$ and $f$ the function computed by the classifier.

My question is the following: Given $n$ predictions of the form $(a_i, p_i)$, how can i measure how well the classifier predicted its accuracy?

I can think of two ways to tackle this issue, but I wonder if these approaches are correct and if there are other established approaches out there.

Binning: By sorting predictions based on their $p_i$ value and splitting them into bins $B_j$, the true accuracy can be estimated as: $a_j = \sum_{(a_i, p_i) \in B_j} \frac{a_i}{|B_j|}$ and for every bin the accuracy $a_j$ can be compared to $\sum_{(a_i, p_i) \in B_j} \frac{p_i}{|B_j|}$, the average predicted accuracy. However, quite some information would be lost if not many predictions are available.
Squared distance: $\sum_{i = 1}^n \frac{(a_i - p_i)^2}{n}$. By measuring the average squared distance between the accuracy per prediction to the predicted accuracy, a relative measure can be computed that is minimal if the true accuracy per prediction matches the predicted accuracy. It is nice that this method forgoes binning and works by considering predictions standalone but the result is a bit hard to interpret.

Could you explain what you mean by "accuracy" and, specifically, how it is quantified in this context? — whuber, May 20 '19 at 21:36
So the predictions are categorical -- maybe even binary for simplicity. The accuracy would then be 1 or 0 depending on if the predicted value matches the actual value. The predicted accuracy would be a function of the input features that tries to compute the probability that the predicted value will be the actual value. — Samuel, May 20 '19 at 21:43
Thank you--that makes the situation more understandable. Would https://stats.stackexchange.com/questions/2275 perhaps be relevant? It looks like the same question. (I remember a thread with better answers than that, but this is the most relevant one I could find with a [site search](https://stats.stackexchange.com/search?q=predict*+penalty+function+prob*+score%3A2).) Ah...here are two more: https://stats.stackexchange.com/questions/20534 and https://stats.stackexchange.com/questions/48419. — whuber, May 20 '19 at 21:49
Thanks @whuber, I believe this is the type of answer I am looking for! I was struggling what to search for. — Samuel, May 20 '19 at 21:55
I'll put this question on hold as a duplicate, then, in order to make the links between your question and those others more apparent to people who find this thread in the future. Welcome to our site, by the way. — whuber, May 20 '19 at 21:57

How to compare predicted accuracy and actual accuracy?

0 Answers0