13

This paper uses generalised linear models (both binomial and negative binomial error distributions) to analyse data. But then in the statistical analysis section of the methods, there is this statement:

...and second by modelling the presence data using Logistic Regression Models, and the foraging time data using a Generalized Linear Model (GLM). A negative binomial distribution with a log link function was used to model the foraging time data (Welsh et al. 1996) and model adequacy was verified by examination of resi- duals (McCullagh & Nelder 1989). Shapiro–Wilk or Kolmogorov–Smirnov tests were used to test for normality depending on sample size; data were log-transformed before analyses to adhere to normality.

If they assume binomial and negative binomial error distributions, then surely they shouldn't be checking for normality of residuals?

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
luciano
  • 12,197
  • 30
  • 87
  • 119
  • 2
    Note that the *errors* aren't binomially distributed - each response is binomially distributed with a probability parameter given by the corresponding predictor values, as per the answers to [one of your other questions](http://stats.stackexchange.com/questions/91473/how-does-logistic-regression-use-the-binomial-distribution). – Scortchi - Reinstate Monica Apr 03 '14 at 16:43
  • 3
    There's nothing in binomial or negative binomial regression than needs to be normal. If it's the response they transform, that may well be highly counterproductive; it will screw up the GLM. – Glen_b Apr 04 '14 at 09:29
  • 1
    It's not clear from your quote what they're in fact testing for normality (are you sure it's the residuals?) or for what analysis they're transforming data (are you sure it's the GLMs?). – Scortchi - Reinstate Monica Apr 04 '14 at 09:55
  • I've expanded quote a little bit. Could someone confirm if what the authors of the paper did was wrong or right? – luciano Apr 04 '14 at 12:43
  • I'm afraid it's still not terribly clear - contact the authors for detail of how they carried out the analysis if it's not explained elsewhere in the paper or its references. – Scortchi - Reinstate Monica Apr 04 '14 at 12:53
  • The only two models they use in the paper are binomial and negative binomial. So if binomial and negative binomial models dont require normal residuals, then their analysis is wrong – luciano Apr 04 '14 at 13:22

2 Answers2

18

NB the deviance (or Pearson) residuals are not expected to have a normal distribution except for a Gaussian model. For the logistic regression case, as @Stat says, deviance residuals for the $i$th observation $y_i$ are given by

$$r^{\mathrm{D}}_i=-\sqrt{2\left|\log{(1-\hat{\pi}_i)}\right|}$$

if $y_i=0$ &

$$r^{\mathrm{D}}_i=\sqrt{2\left|\log{(\hat{\pi}_i)}\right|}$$

if $y_i=1$, where $\hat{\pi_i}$ is the fitted Bernoulli probability. As each can take only one of two values, it's clear their distribution cannot be normal, even for a correctly specified model:

#generate Bernoulli probabilities from true model
x <-rnorm(100)
p<-exp(x)/(1+exp(x))

#one replication per predictor value
n <- rep(1,100)
#simulate response
y <- rbinom(100,n,p)
#fit model
glm(cbind(y,n-y)~x,family="binomial") -> mod
#make quantile-quantile plot of residuals
qqnorm(residuals(mod, type="deviance"))
abline(a=0,b=1)

Q-Q plot n=1

But if there are $n_i$ replicate observations for the $i$th predictor pattern, & the deviance residual is defined so as to gather these up

$$r^{\mathrm{D}}_i=\operatorname{sgn}({y_i-n_i\hat{\pi}_i})\sqrt{2\left[y_i\log{\frac{y_i}{n\hat{\pi}_i}} + (n_i-y_i)\log{\frac{n_i-y_i}{n_i(1-\hat{\pi}_i)}}\right]}$$

(where $y_i$ is now the count of successes from 0 to $n_i$) then as $n_i$ gets larger the distribution of the residuals approximates more to normality:

#many replications per predictor value
n <- rep(30,100)
#simulate response
y<-rbinom(100,n,p)
#fit model
glm(cbind(y,n-y)~x,family="binomial")->mod
#make quantile-quantile plot of residuals
qqnorm(residuals(mod, type="deviance"))
abline(a=0,b=1)

Q-Q plot n=30

Things are similar for Poisson or negative binomial GLMs: for low predicted counts the distribution of residuals is discrete & skewed, but tends to normality for larger counts under a correctly specified model.

It's not usual, at least not in my neck of the woods, to conduct a formal test of residual normality; if normality testing is essentially useless when your model assumes exact normality, then a fortiori it's useless when it doesn't. Nevertheless, for unsaturated models, graphical residual diagnostics are useful for assessing the presence & the nature of lack of fit, taking normality with a pinch or a fistful of salt depending on the number of replicates per predictor pattern.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
1

What they did is correct! I will give you a reference to double check. See Section 13.4.4 in Introduction to Linear Regression Analysis, 5th Edition by Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining. In particular, look at the examples on page 460, where they fit a binomial glm and double check the normality assumption of the "Deviance Residuals". As mentioned on page 458, this is because "the deviance residuals behave much like ordinary residuals do in a standard normal-theory linear regression model". So it make sense if you plot them on normal probability plot scale as well as vs fitted values. Again see page 456 of above reference. In the examples they have provided on page 460 and 461, not only for the binomial case, but also for the Poisson glm and the Gamma with (link=log), they have checked the normality of deviance residuals.
For the binomial case the deviance residual is defined as: $$r^{D}_i=-\sqrt{2|\ln{(1-\hat{\pi_i})}|}$$ if $y_i=0$ and $$r^{D}_i=\sqrt{2|\ln{(\hat{\pi_i})}|}$$ if $y_i=1$. Now some coding in R to show you how you can get it:

> attach(npk)

> #Fitting binomila glm
> fit.1=glm(P~yield,family=binomial(logit))
> 
> #Getting deviance residuals directly
> rd=residuals(fit.1,type = c("deviance"))
> rd
         1          2          3          4          5          6          7 
 1.1038306  1.2892945 -1.2912991 -1.1479881 -1.1097832  1.2282009 -1.1686771 
         8          9         10         11         12         13         14 
 1.1931365  1.2892945  1.1903473 -0.9821829 -1.1756061 -1.0801690  1.0943912 
        15         16         17         18         19         20         21 
-1.3099491  1.0333213  1.1378369 -1.2245380 -1.2485566  1.0943912 -1.1452410 
        22         23         24 
 1.2352561  1.1543163 -1.1617642 
> 
> 
> #Estimated success probabilities
> pi.hat=fitted(fit.1)
> 
> #Obtaining deviance residuals directly
> rd.check=-sqrt(2*abs(log(1-pi.hat)))
> rd.check[P==1]=sqrt(2*abs(log(pi.hat[P==1])))
> rd.check
         1          2          3          4          5          6          7 
 1.1038306  1.2892945 -1.2912991 -1.1479881 -1.1097832  1.2282009 -1.1686771 
         8          9         10         11         12         13         14 
 1.1931365  1.2892945  1.1903473 -0.9821829 -1.1756061 -1.0801690  1.0943912 
        15         16         17         18         19         20         21 
-1.3099491  1.0333213  1.1378369 -1.2245380 -1.2485566  1.0943912 -1.1452410 
        22         23         24 
 1.2352561  1.1543163 -1.1617642 
> 

Check here for the Poisson case as well.

Stat
  • 7,078
  • 1
  • 24
  • 49
  • Don't have access to reference but think I can understand why residuals should be normally distributed. For example, in a poisson regression, I'm guessing the model somehow deals with the poisson distribution of the raw values, with net result being normally distributed residuals. But maybe you could expand on what deviance residuals are? – luciano Apr 03 '14 at 14:31
  • 2
    Your example is an odd choice. *Did* you make a P-P or Q-Q plot of those deviance residuals; if so, what did you conclude? – Scortchi - Reinstate Monica Apr 03 '14 at 15:59
  • No, I didn't make any plot. I gave this example to show how to obtain deviance residuals NOT to check the model! – Stat Apr 03 '14 at 16:05
  • 5
    Point is in this case there'd be no sense in checking the normality of the residuals - they're clearly not normally distributed, nor should they be. It's only as the number of observations for each predictor pattern increases that the distribution of residuals (one residual being calculated per predictor pattern) tends to the normal. Similarly for a Poisson or negative binomial model - the counts need to be large-ish for the normal approximation to be good. – Scortchi - Reinstate Monica Apr 03 '14 at 16:22
  • Scortchi, clearly you miss-understood the point of giving these codes. As I said before, I gave this example to SHOW HOW TO OBTAIN THE RESIDUAL DEVIANCE and double check the formula. Nothing more! I suppose it should be clear enough. I picked up this data and the model randomly, so don't spend too much time on model checking as it would be completely useless for this example. – Stat Apr 03 '14 at 16:47
  • Scortchi in attached linked paper, do you think the residuals should have been normally distributed? – luciano Apr 03 '14 at 16:56
  • 2
    The question is whether residuals from generalized linear models should be normally distributed. Your answer *appears* to be an unqualified "yes" (though your sources doubtless give the necessary qualifications, not every reader will check them). You then give an example in which there is *no reason at all* to expect the residuals to be normally distributed, even if the model were correctly specified: an unwary reader will assume that they should be & that, as they are clearly not, this is therefore an example of detecting model mis-specification by examining the residuals (though you ... – Scortchi - Reinstate Monica Apr 03 '14 at 20:16
  • 2
    ... haven't said it is). So I think the answer requires much clarification to be useful. – Scortchi - Reinstate Monica Apr 03 '14 at 20:17
  • 1
    I do know Montgomery from other books, which is why I said this one doubtless qualifies the statement that residuals from GLMs are normally distributed by explaining that it's only an asymptotic approximation. – Scortchi - Reinstate Monica Apr 03 '14 at 21:50
  • 2
    IMO @Scortchi's comments are reasonable here. Looking at what I can see of the Montgomery book [on google books preview](http://books.google.ca/books?id=0yR4KUL4VDkC&printsec=frontcover&dq=Introduction+to+Linear+Regression+Analysis&hl=en&sa=X&ei=kUc_U5-WAoS-sQTv0IK4DA&ved=0CDcQ6AEwAA#v=onepage&q=deviance&f=false) they do make the QQ plot, but don't perform an actual normality test like mentioned by the original poster. Sure making the QQ plot is reasonable as a diagnostic test, but in pretty much all realistic circumstances even the deviance resids. won't be normal. – Andy W Apr 05 '14 at 00:09
  • 1
    There are a couple things I should say: 1) OP is not asking about how to check the normality in R. So there is no need to show how to plot a qq-plot or other tests. 2) OP is not asking whether those tests should be applied or the visual checking using qq-plot. The OP wants to know if the normality should be checked or not in those model or at least this is what I can understand. 3) In Montgomery, they didn't use those tests like Shapiro–Wilk. Because a small departure from normality will result in rejecting the null of normality using those tests which is not practical. – Stat Apr 05 '14 at 00:31
  • 1
    Stat, I believe you're misunderstanding the meaning of the word "unqualified". It's not a reference to Montgomery's qualifications, it means that Montgomery (being a very able statistician) will not have made such a statement as if it were generally true when it is not -- i.e. would not say it without giving a clear indication of the circumstances in which it holds (a condition that must be fulfilled is called a *qualification*). I'd advise careful re-reading of Montgomery et al to find where that qualification appears (it might be close to what you're looking at or it may be somewhat earlier) – Glen_b Nov 25 '16 at 01:45