Testing for overdispersion in logistic regression

Question

R in Action (Kabacoff, 2011) suggests the following routine to test for overdispersion in a logistic regression:

Fit logistic regression using binomial distribution:

model_binom <- glm(Species=="versicolor" ~ Sepal.Width,
                   family=binomial(), data=iris)

Fit logistic regression using quasibinomial distribution:

model_overdispersed <- glm(Species=="versicolor" ~ Sepal.Width, 
                           family=quasibinomial(), data=iris)

Use chi-squared to test for overdispersion:

pchisq(summary(model_overdispersed)$dispersion * model_binom$df.residual, 
       model_binom$df.residual, lower = F)
# [1] 0.7949171

Could somebody explain how and why the chi-squared distribution is being used to test for overdispersion here? The p-value is 0.79 - how does this show that overdispersion is not a problem in the binomial distribution model?

It is pretty hard to not fit the Bernoulli distribution unless you have correlated observations. What about the fit do you suspect is inadequate? — Frank Harrell, Mar 24 '14 at 03:26
By correlated observations do you mean that each Bernoulli trial is not independent? — luciano, Mar 26 '14 at 19:23
Yes, e.g. serial or within-cluster correlation; non-independent trials. — Frank Harrell, Mar 26 '14 at 20:19

score 5 · Answer 1 · edited Mar 11 '15 at 13:51

5

The approach described requires unnecessary computations. The test statistic is just

sum(residuals(model_binom, type = "deviance")^2)

This is exactly equal to the Pearson $\chi^2$ test statistic for lack of fit, hence it have chi-squared distribution.

Overdispersion as such doesn't apply to Bernoulli data. Large value of $\chi^2$ could indicate lack of covariates or powers, or interactions terms, or data should be grouped. A p-value of 0.79 indicates the test failed to find any problems.

edited Mar 11 '15 at 13:51

Sycorax

76,417
20
189
313

answered Mar 11 '15 at 13:47

oleh

51
1
2

5

Shouldn't the answer above be modified as follows? `sum(residuals(model_binom, type = "deviance")^2)/model_binom$df.residual` – Steve VW Jan 24 '17 at 18:44

Florian Hartig · Answer 2 · 2021-01-27T16:36:57.203

1

As @oleh says, the chi2 test is basically a general GOF, which will be triggered by overdispersion, but could be triggered also by other problems.

You can test specifically for overdispersion in binomial GLMs with the DHARMa R package (disclaimer: I'm the developer), which compares the dispersion in the data with the dispersion of simulated data from the fitted model.

model_binom <- glm(Species=="versicolor" ~ Sepal.Width,
                   family=binomial(), data=iris)
library(DHARMa)
testDispersion(model_binom)

However, note that the raw 0/1 response in a logistic regression cannot have overdispersion, so this test (as any other dispersion test) will never be positive. Overdispersion tests on a 0/1 response only make sense if you group the residuals. See the comments specific to binomial responses in the DHARMa vignette here.

edited Jan 27 '21 at 16:36

answered Jan 27 '21 at 15:15

Florian Hartig

6,499
22
36

This should be a comment. There isn't any information about how to test for overdispersion here, just information about a function / package that will conduct the test for you. There's nothing wrong with that, but it isn't an answer by our standards--it's a useful comment. – gung - Reinstate Monica Jan 27 '21 at 15:37
OK, I have expanded this a bit, hope it is more helpful now. – Florian Hartig Jan 27 '21 at 15:45
This is still a comment, IMHO. What does `testDispersion()` do? How does it work? If I used, say, minitab, I could sum the square of the deviance residuals & compare it to the chi-squared distribution (from the other answer). How would I reproduce this strategy in other software from the information given here? There's nothing wrong w/ including R code (I do it all the time), but that shouldn't be all there is in an answer. – gung - Reinstate Monica Jan 27 '21 at 15:49
I have made a final addition to clarify the idea, but I agree that it's not an in-depth explanation of the test (which is of course documented in the help). I don't think it's useful to go more in depth here, as the OP didn't ask about this, but a different test. I had still thought that the info is relevant for the OP, as it seems he wants to test for overdispersion in a logistic regression. Feel free to transform this to a comment or remove if you feel it's not relevant here. – Florian Hartig Jan 27 '21 at 16:40
Thanks. This is probably enough to be an answer. +1 – gung - Reinstate Monica Jan 27 '21 at 16:51

Testing for overdispersion in logistic regression

2 Answers2

Linked