When to use predictive power versus when to use model fitting metrics?

Question

I built a binary classifier using logistic regression. But I can't seem to rationalize this in my mind. After cross validation, the model's AUC is 0.9003. But, as a sanity check, I ran a GOF (goodness of fit) test using Hosmer-Lemeshow test. What I found was that the model did not fit the data well at all!! (the p-value on the $\chi^2$-square test was very small).

I'm confused as to whether or not I should use a GOF to assess the quality of my model. If it predicts really well - do I really care if the GOF is bad?

Is the GOF measurement only relevant if I care about interpreting the results of the model, like the parameters, etc.?

This article tries to highlight the differences, but doesn't go into how to use them in practice. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575184/

This is indeed possible. In a classification setting you have the luxury to underfit if classes are easily separable. You will still separate the classes but the fit will not be good. And this is not something bad. See related discussion http://stats.stackexchange.com/questions/208529/when-is-it-appropriate-to-use-an-improper-scoring-rule/251654#251654 — Cagdas Ozgenc, Dec 19 '16 at 06:20
For that matter, we don't really care if it's overfit or underfit. Either one can cause the fit to be bad, but the predictions to still be pretty good. — makansij, Dec 23 '16 at 23:42
Hosmer-Lemeshow is considered obsolete: https://stats.stackexchange.com/questions/273966/logistic-regression-with-poor-goodness-of-fit-hosmer-lemeshow — kjetil b halvorsen, May 14 '20 at 12:14

When to use predictive power versus when to use model fitting metrics?

0 Answers0