Test Set Probabilities and Accuracy

Question

Say we've got a logistic regression model $M$ used as a classifier in a binary case. Now we take a test set $\tau=\{(x_1,y_1),...,(x_n,y_n)\}$, each test sample is assigned with $\hat{\pi}_i=P(y_i=1|x_i)$ and then, assuming naive threshold of 0.5, a prediction is made: $\hat{y}_i=\left\{\begin{matrix}1 & \hat{\pi}_i\geq 0.5\\ 0 & \hat{\pi}_i< 0.5\end{matrix}\right.$.

We can then $M$'s discuss accuracy rate as $\frac{1}{n}\sum_{i}{I\{\hat{y}_i=y_i\}}$ or maybe its certainty rate $\frac{1}{n}\sum_{i}{max(\hat{\pi}_i,1-\hat{\pi}_i)}$.

Wherever I've looked, a logistic regression classifier is assessed either by information criteria or by cross-validation, both eventually relating to the degree of fit to the train dataset.

It might look a total stranger's question, but are there are known methods using accuracy or certainty? it might be just me missing this info somehow.

There is a reason accuracy is not used/should not be used: it is not a proper score function. Search this site — kjetil b halvorsen, Dec 05 '18 at 23:59
(dups): There are many relevant posts here already. Some: https://stats.stackexchange.com/questions/145875/alternative-notions-to-that-of-proper-scoring-rules-and-using-scoring-rules-to?r=SearchResults, https://stats.stackexchange.com/questions/148014/the-intuition-behind-the-different-scoring-rules?r=SearchResults, https://stats.stackexchange.com/questions/126965/justifying-and-choosing-a-proper-scoring-rule m,any others ... Also look through Frank Harrell's blog: http://www.fharrell.com/ — kjetil b halvorsen, Dec 11 '18 at 13:22
It seems that Brier score would fit for my needs, I'll investigate further on this. Thanks! — Spätzle, Dec 11 '18 at 15:10

Test Set Probabilities and Accuracy

0 Answers0