How to compare classification methods in terms of performance?

Question

I'd like to compare logistic regression to classification trees. In a first step, I compared the theoretical framework of the two classifiers. In a second step, I compared the performance using a rather unbalanced data set containing two classes. I therefore compared confusion-matrices, balanced-accuracy, sensitivity and specificity. Moreover, I compared the ROC curves and derived therefrom the AUC values. Are there any other value adding measures to compare the classifiers? How would you compare the efficiency in terms of running time?

score 1 · Answer 1 · answered Dec 30 '15 at 11:25

1

You seem to be using a mish-mash of methods. Focus on getting predicted risks and using proper accuracy scores such as the Brier score or logarithmic scoring rule (log likelihood; related to pseudo $R^2$). Things started going south when you chose to use classifiers rather than predictors. And note that regression trees are highly unstable when the sample size is $<100,000$ subjects for example. That's why people use bagging, boosting, and random forests instead of single trees.

Proper accuracy scores are not destroyed by imbalanced $Y$.

answered Dec 30 '15 at 11:25

Frank Harrell

74,029
5
148
322

@ Frank Harrell Thank you for the insightful comment! So you would for example recommend to use Pseudo R^2 to compare the performance? This would be rather straightforward since I guess I can derive them directly from the confusion matrix. This would be a great measure since I have an economics background and R^2 is often used there. – Patrick Balada Dec 30 '15 at 12:14
And if I may ask. Why do you call it a "proper" score? – Patrick Balada Dec 30 '15 at 15:26
A proper accuracy scoring rule is one that gives the right rewards, i.e., it is optimized by a correct model. An improper accuracy score is optimized by choosing the wrong features and giving them the wrong weights. – Frank Harrell Dec 30 '15 at 15:41
Ok. What is the advantage compared to AUC then? I really appreciate your help but I'm not familiar with the Brier Score yet. – Patrick Balada Dec 30 '15 at 16:27
Proper accuracy scores have greater discrimination ability / power. The concordance probability (c-index; AUROC) is not able to powerfully compare two models because it uses only the ranks of the predictions and doesn't reward extreme predictions that are correct. – Frank Harrell Dec 30 '15 at 16:29
Thank you! So if I understand you correctly. An advantage of Brier is that whether I predict 0.9 or 0.95 and then classify the observations as 1 might make no difference on the AUC but does so on the Brier score? – Patrick Balada Dec 30 '15 at 16:47
Don't classify any observations. Derive an estimate of the probability of category membership. – Frank Harrell Dec 30 '15 at 17:01
Do you by chance know any papers/sources which cover the distinction between AUC and Brier? – Patrick Balada Dec 30 '15 at 17:31
1

My course notes go into this a bit especially in the logistic regression chapter. See http://biostat.mc.vanderbilt.edu/rms and look under course materials. – Frank Harrell Dec 30 '15 at 17:32
One last question. Is Brier Score (MSE) appropriate to compare for example the performance of classification trees to logistic regressio? Since what I read so far, MSE is mostly used for regression trees. – Patrick Balada Dec 30 '15 at 22:40
Brier score will be excellent for this purpose, given that you have proportions of $Y=1$ for each tree node and given that you use resampling to correct for overfitting or have a huge holdout sample. – Frank Harrell Dec 31 '15 at 00:03
Thank you - I will follow your advice and use Brier Score. So, it would be correct to say that the area under the roc curve is kind of a biased measure since it is hard to tell whether it is good or not without any information about the class-distribution in my test set (a little bit like accuracy)? – Patrick Balada Dec 31 '15 at 09:45
No, the $c$-index (AUROC) because it is a concordance probability is an excellent measure of pure discrimination that works for extremely unbalanced $Y$. It's just that it is not sensitive enough to be used to compare two predictive methods. – Frank Harrell Dec 31 '15 at 15:21
Sorry. I guess, I still have some issues understanding the intuition behind it. But why is it then exactly more sensitive? – Patrick Balada Dec 31 '15 at 15:37
1

The concordance probability (AUROC) is _less_ sensitive. Think of a pair of predicted probabilities 0.2 and 0.8 corresponding to a non-event and an event. Then think of a pair 0.1 and 0.9. Both get one point in the concordance calculation even though the second goes out on a limb. You can also think about this from the standpoint that the concordance probability is essentially the Wilcoxon-Mann-Whitney statistic for comparing two groups ($Y=0, Y=1$). Comparing 2 $c$-indexes is essentially the same as subtracting one Wilcoxon stat from another, which no one does. – Frank Harrell Dec 31 '15 at 21:06

How to compare classification methods in terms of performance?

1 Answers1

Linked