Scoring classification model performance often seems somewhat abstract (looking at you AUC scores...). There's always accuracy score, which has the advantage of being nice and easy to comprehend and which is great for explaining how well the model will work to someone else (like say, the people who are actually going to use the predictions it makes). I intuitively expect there to be a common similar method for probability predictions, for example a simple "average distance from truth" along the lines of:
| Truth | Prediction | Score |
| ----- | ---------- | ----- |
| 1 | 0.97 | 0.03 |
| 0 | 0.35 | 0.35 |
| 1 | 0.76 | 0.24 |
| 0 | 0.42 | 0.42 |
With the score for the model as a whole being taken as the average of those scores; 0.26 in this case. That's pretty easy to manually do, but it surprises me that a) this isn't a common scoring metric and b) there doesn't seem to be any in-built methods in the scikit-learn api.
So my question is this: is "average distance from truth" a useful scoring metric and if the answer is no, why not?