5

On this page, I am interested in the section “goodness of fit”, which is near the bottom of the page and contains the table of deviance functions.

The author states that the scaled deviance, i.e. $D^*= \frac{D(y, \mu)}{\phi} $ has a limiting $ \chi^2_{n -p} $ distribution, where $ n $ is the number of observation and $ p $ is the number of predicted parameters.

He then goes on to say that in the case $ \phi $ is unknown, it can be predicted as $ \hat{\phi} = \frac{D}{n - p} $. If this is the case, wouldn't the scaled deviance be equal to $ \frac{D(y, \mu)}{\frac{D}{n - p}} = n - p $? I think I am misunderstanding the discrepancy between $ D(y, \mu) $ and $ D $ (without arguments).

A similar thing occurs with the scaled Pearson's chi-squared statistic.

Could someone elaborate on how to calculate scaled deviance in the case that $ \phi $ is unknown and how to proceed with the g.o.f. test?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Jon Claus
  • 535
  • 1
  • 4
  • 12

1 Answers1

7

I think when you allow for unknown dispersion, the GLM is no longer a maximum likelihood technique, but maximizes a "quasi" likelihood. Because of that, the deviance is fixed by using the sample dispersion as the model dispersion (as a consequence of maximizing quasi likelihood). By treating the dispersion as a parameter in a quasilikelihood, the family of e.g. quasibinomial likelihoods are equivalent to the binomial likelihood up to a proportional constant. Maximizing quasilikelihood treats this constant like a nuisance parameter.

Think of it like this: when the probability model for the underlying GLM is correct then the sample deviance will have an expected value of 1 (as confirmed by your formulation and limiting distribution statement above). But random variation in that data will show that the probability model does not always perfectly fit such data.

When the sample deviance is egregiously different from such, this is indication that the working probability model is not a good probability framework for the observed data. This doesn't mean that the inference on parameters is incorrect. In fact, by using scaled deviance, you can account for this over or under dispersion in the working GLM and get correct inference on the parameters. This is the GLM obtained from maximum quasilikelihood.

I recommend looking at Alan Agresti's example of horseshoe crabs and quasipoisson and quasibinomial models in Categorical Data Analysis 2nd ed for further clarification.

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • Are residuals (such as Ascombe's or Pearson's) a valid measure for goodness of fit? In Generalized Linear Models (Nelder & McCullagh, 1979), they discuss them but only briefly mention them in the context of goodness of fit, without in depth discussion of such a technique. – Jon Claus May 10 '13 at 21:56
  • I forgot to continue by saying that goodness-of-fit loses its meaning when no longer performing maximum likelihood. In short, you should look for a different tool depending on whether you aim to do prediction or inference, and verify the robustness of other assumptions. For diagnostics, you could look at the distribution of jackknife or bootstrapped parameter estimates, a plot of cooks-distance, the Hosmer Lemeshow test of calibration, and a host of other tools. – AdamO May 10 '13 at 21:59
  • Do you know the mechanism by which R decides when to stop iterating during `glm`? It can perform `glm` on a family with a non-constant dispersion, such as seen in the Gamma family. – Jon Claus May 10 '13 at 22:47
  • R uses the Newton Raphson algorithm, or equivalently Fisher Scoring, so it iterates through coefficient estimates and stops when the maximum tolerance is less than 1e-08 between old and new coefficient estimates. With the gamma GLM, the dispersion is calculated post-hoc. Only in the case of quasilikelihood does R jointly estimate dispersion and the coefficient estimates. – AdamO May 10 '13 at 23:11