For an adequate fit of GLM models

Question

I am trying to understand how to test for the goodness of fit in GLM regression. I am using an example from the book Davison and Hinkley (1997). In R, they fit the following model:

data(cane) # this dataset is available by default
cane.glm <- glm(y ~ block+var,family=binomial,data=cane)
summary(cane.glm)

Then they write that "for an adequate fit, the deviance would roughly be distributed according to a $\chi^2_{132}$ (where $132$ is the residual degrees of freedom of the regression); in fact, it is 1142.8; This indicates overdispersion relative to the model." They, however, miss to provide more details on how to proceed.

I think I need to compute the deviance deviance(cane.glm) and use pachisq function to test whether the observed value is far away from the theoretical distribution. I don't understand, however, the exact testing procedure. Any example in R would be much appreciated.

In addition, this reference (p.8) suggest that this type of "goodness of fit" test "does not actually work very well." Why not? What else should I use?

Why would you need to test anything? 1142.8 is plainly very deep into the tail of a chi-square with 132 d.f. (which has mean 132 and s.d. $\sqrt{2\times 132}\approx 16.25)$... what more would a test tell you? But in any case the question isn't "are these data drawn from this specific Poisson GLM?" (they aren't and with enough data a test would always tell you so) but whether the model is a suitable description, and for that, knowing that the deviance is nearly 9 times as big as you'd expect it to be (with consequent impact on inference) is the thing that leads us to look for what's wrong. — Glen_b, Jul 15 '17 at 23:07
It's a pity the author of your linked notes doesn't offer a full reference for Hosmer et al 1997 which would probably explain the "doesn't work well" claim. I wonder if the intent was "A comparison of goodness of fit tests for the logistic regression model", *Statistics in Medicine*, **16**, 965-980 ... If that's the reference, its relevance to Poisson regression is not immediately clear, however. — Glen_b, Jul 15 '17 at 23:47
I don't see a `cane` dataset that is loaded automatically. Are you referring to [?cane](https://stat.ethz.ch/R-manual/R-devel/library/boot/html/cane.html) in the `boot` package? There is no `y` variable in that dataset. How did you construct it, `y = with(cane, cbind(r, n-r))`? — gung - Reinstate Monica, Aug 05 '17 at 21:30
I think you will find the information you need in the linked thread. Please read it. If it isn't what you want / you still have a question afterwards, come back here & edit your question to state what you learned & what you still need to know. Then we can provide the information you need without just duplicating material elsewhere that already didn't help you. Beyond the duplicated question, you have a request for an R code demonstration, which is off topic here but actually exists at the linked thread nonetheless, & a question based on an obscure comment in a linked source w/o context. — gung - Reinstate Monica, Aug 06 '17 at 20:44

For an adequate fit of GLM models

0 Answers0