Consider a classifier that, given an input vector ${\bf x}$ outputs both a prediction $y'$ whose accuracy ($a \in \{0, 1\}$) can be measured, as well as a predicted accuracy that corresponds to the predicted probability ($p \in [0, 1]$) that the prediction is correct. The accuracy is computed w.r.t. to the known target value $y$, i.e., $a = 1$ if $y = y'$ and $0$ otherwise. Ideally, the predicted accuracy represents $p(y = y' | {\bf x})$ with $y' = f({\bf x})$ and $f$ the function computed by the classifier.
My question is the following: Given $n$ predictions of the form $(a_i, p_i)$, how can i measure how well the classifier predicted its accuracy?
I can think of two ways to tackle this issue, but I wonder if these approaches are correct and if there are other established approaches out there.
Binning: By sorting predictions based on their $p_i$ value and splitting them into bins $B_j$, the true accuracy can be estimated as: $a_j = \sum_{(a_i, p_i) \in B_j} \frac{a_i}{|B_j|}$ and for every bin the accuracy $a_j$ can be compared to $\sum_{(a_i, p_i) \in B_j} \frac{p_i}{|B_j|}$, the average predicted accuracy. However, quite some information would be lost if not many predictions are available.
Squared distance: $\sum_{i = 1}^n \frac{(a_i - p_i)^2}{n}$. By measuring the average squared distance between the accuracy per prediction to the predicted accuracy, a relative measure can be computed that is minimal if the true accuracy per prediction matches the predicted accuracy. It is nice that this method forgoes binning and works by considering predictions standalone but the result is a bit hard to interpret.