1

I have 3 possible "final" models in binary logistic regression (N=176, Number of events = 36).

Now I am trying to decide which one to select. It´s clear,"All models are wrong, but some are useful", but I have to decide. My goal is prediction and so parsimony over complexity.

Which criterion to use? Corrected AIC in combination with adjusted Pseudo-R2? And then AUC with calibration curve for each model in order to compare them?

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
Juraj
  • 103
  • 2
  • 7
  • 3
    If you are mostly interested in prediction, you may get the best results not by selecting a model, but by *combining* them: take averages of predicted probabilities, or use majority votes. Test this on holdout test data. – Stephan Kolassa May 26 '16 at 11:28
  • Thank you for your suggestion. It´s definitely a good idea, but let´s say I have to decide anyway (again, choosing the "best" option when all of them are bad). Is it correct that in this case, when prediction is my goal, to compare measures of predictive power (such as Brier score, adjusted pseudo-R2, C-statistic, calibration curve, etc.) between models would be a better option than just looking at the "goodness-of-fit statistic and AIC? – Juraj May 26 '16 at 13:56
  • Well, keeping in mind that you are still most interested in prediction, I'd recommend that you decide on a quality measure (are false positives more painful than false negatives, or vice versa), then cross-validate your three candidate models. [Like this, although that question is not a straight-up duplicate.](http://stats.stackexchange.com/a/214125/1352) – Stephan Kolassa May 26 '16 at 19:39

0 Answers0