Improve the goodness-of-fit of a logistical regression

Question

I've been working on a data set with binary outcome. Logistic regression was used to fit the outcome with several covariates, all of which are categorical variables.

I tried to assess the goodness-of-fit of the logistic model to the data using the Pearson's chi-square and deviance statistics, however none of them showed the model fit to the data (p-values < 0.0001). I also used Hosmer and Lemeshow (HL) test, the results is similar with p-value < 0.0001. In order to improve the model's fit, I then added the interactions between the main effects into the model, after adding all possible interactions the fit didn't improve a lot and p-values are still < 0.0001.

Deviance and Pearson Goodness-of-Fit Statistics

Criterion   Value        DF   Value/DF  Pr > ChiSq
Deviance    29693.7476  4384    6.7732  <.0001
Pearson     31175.7340  4384    7.1113  <.0001

Number of unique profiles: 4409

I remember that we can always make the model fit by making it more complex but wondering how should I proceed? Does it make sense to include polynomial terms of the categorical variables? Or make the model non-linear? (I'm reluctant to do these as it would be hard to interpret the results.)

Here is a brief summary of the data I'm working on

Number of Observations 495851

Number of Events 105069

I'm also wondering if it's because the sample size is too large so that it's hard to find a relative simple more that fits to the data?

Thanks!

it may be good to take a look at this question: http://stats.stackexchange.com/questions/169000/goodness-of-fit-test-in-logistic-regression-which-fit-do-we-want-to-test — , Dec 29 '15 at 16:58

Improve the goodness-of-fit of a logistical regression

0 Answers0