5

I am running a logistic regression model in r programming and wanted to know the goodness of fit of it since the command does not give out the f-test value as in the linear regression models.

So I used the following command:

summary( glm( vomiting ~ age, family = binomial(link = logit) ) )

# Call:
# glm(formula = vomiting ~ age, family = binomial(link = logit))

# Deviance Residuals:
#    Min       1Q   Median       3Q      Max 
# -1.0671  -1.0174  -0.9365   1.3395   1.9196 

#  Coefficients:
#               Estimate Std. Error z value Pr(>|z|)   
#  (Intercept) -0.141729   0.106206  -1.334    0.182   
#  age         -0.015437   0.003965  -3.893 9.89e-05 ***
#  ---
#  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' '

#  (Dispersion parameter for binomial family taken to be 1
#  Null deviance: 1452.3  on 1093  degrees of freedom
#  Residual deviance: 1433.9 on 1092  degrees of freedom

#  AIC: 1437.9
#  Number of Fisher Scoring iterations: 4

Then I run the following which I got some idea from someone else and get:

 1-pchisq(1452.3-1433.9, 1093-1092)
 # [1] 1.79058e-05

May I know in detail what the null hypothesis and alternative hypothesis are and what this 1.79058e-05 value means in this case?

Eric
  • 434
  • 1
  • 10
  • 27
  • 3
    I suggest to use the Hosmer-Lemeshow goodness of fit test for logistic regression which is implemented in the `ResourceSelection` library with the `hoslem.test` function. See: http://thestatsgeek.com/2014/02/16/the-hosmer-lemeshow-goodness-of-fit-test-for-logistic-regression/ – Marco Sandri Oct 22 '17 at 09:32
  • Thank you but I am a bit confused. So from the above example if my final 1-pchisq(1452.3-1433.9, 1093-1092) is 1.79058e-05, does this mean my logistic regression model is a good fit or not..? – Eric Oct 22 '17 at 14:14
  • 2
    I agree with @RuiBarradas. The test that you are using is not a goodness-of-fit test but a likelihood ratio test for the comparison of the proposed model with the null model. P=1.79058e-05 means that the fit of your model is significantly better than the fit of the null model – Marco Sandri Oct 22 '17 at 14:38
  • 3
    Like @MarcoSandri says, your model is significantly better than the model `vomiting ~ 1`, which is basically a computation of the means. Read the link I've provided you with. – Rui Barradas Oct 22 '17 at 15:40
  • @Rui Barradas: Thank you but it seems like your provided link is gone now.. – Eric Oct 22 '17 at 16:58
  • @Marco Sandri: Thank you. So when I officially report this result, if I want to call it some sort of 'chi-square test' result, shall I simply put out the pchisq(1452.3-1433.9, 1093-1092) without subtracting it from 1 and report this as an evidence for a type of goodness-of-fit? The Wikipedia says "a likelihood ratio test is a statistical test used for comparing the goodness of fit of two statistical models". – Eric Oct 22 '17 at 16:58
  • 2
    @Eric No. If you want to make a goodness-of-fit test on your logistic regression model, use the Hosmer-Lemeshow test: https://en.wikipedia.org/wiki/Hosmer%E2%80%93Lemeshow_test The test statistic asymptotically follows a chi-square distribution. – Marco Sandri Oct 22 '17 at 17:05
  • @Marco Sandri: Thank you again. I tried running the test using hoslem.test() function in r. If the p-value is greater than 0.05, does this mean my model is in good fit? – Eric Oct 22 '17 at 17:19
  • 1
    @Eric The Hosmer–Lemeshow test determine if the differences between observed and expected proportions are significant. If your p is greater than 0.05, than you can say that you have a good fit. – Marco Sandri Oct 22 '17 at 17:24
  • @Marco Sandri: However, there is a post that uses my method https://www.r-bloggers.com/veterinary-epidemiologic-research-glm-evaluating-logistic-regression-models-part-3/ – Eric Oct 22 '17 at 17:25
  • @Marco Sandri: But the strange thing is that using your method and my method sometimes give different conclusions. – Eric Oct 22 '17 at 17:25
  • @Marco Sandri: For instance, if I call my logistic regression as "test", then I run "hoslem.test(test$y, fitted(test))" but the conclusion for some cases are not a good fit while 1-pchisq(1452.3-1433.9, 1093-1092) gives good fit. Should I ignore my method and follow yours? – Eric Oct 22 '17 at 17:29
  • @Marco Sandri: Oh sorry my mistake..both give the same conclusion. I was confused with the value of "1" from my method as good fit..Thank you so much anyways! – Eric Oct 22 '17 at 17:32
  • @Marco Sandri: Yes but no worries it was my misunderstanding. Both gives same conclusions. – Eric Oct 22 '17 at 17:33
  • 1
    Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/67473/discussion-between-marco-sandri-and-eric). – Marco Sandri Oct 22 '17 at 17:34
  • @Eric Link: [Likelihood ratio test in R](https://stats.stackexchange.com/questions/6505/likelihood-ratio-test-in-r). – Rui Barradas Oct 22 '17 at 17:39

1 Answers1

2

I suggest to use the Hosmer-Lemeshow goodness of fit test for logistic regression which is implemented in the ResourceSelection library with the hoslem.test function. See: thestatsgeek.com/2014/02/16/ - Marco Sandri

But as @kjetilbhalvorsen points out below, Frank Harrell disagrees:

The Hosmer-Lemeshow test is to some extent obsolete because it requires arbitrary binning of predicted probabilities and does not possess excellent power to detect lack of calibration. It also does not fully penalize for extreme overfitting of the model. Better methods are available such as

Hosmer, D. W.; Hosmer, T.; le Cessie, S. & Lemeshow, S. A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine, 1997, 16, 965-980

Their new measure is implemented in the R rms package.

mkt
  • 11,770
  • 9
  • 51
  • 125
  • 4
    I've copied this comment by @MarcoSandri as a community wiki answer because the comment is, more or less, an answer to this question. We have a dramatic gap between answers and questions. At least part of the problem is that some questions are answered in comments: if comments which answered the question were answers instead, we would have fewer unanswered questions. – mkt Aug 19 '18 at 11:53
  • 4
    Please see this answer https://stats.stackexchange.com/questions/18750/hosmer-lemeshow-vs-aic-for-logistic-regression by Frank Harrell for diverging opinion! He seems to think Hosmer-Lemeshow is obsolete – kjetil b halvorsen Aug 31 '18 at 08:35
  • 2
    @kjetilbhalvorsen Thanks, edited to include that disagreement. – mkt Aug 31 '18 at 08:43