How does glm.nb work?

Question

I have been working with glm.nb from MASS package for quite a while now. However, there are somethings I seem to not quite able to get my head around. Suppose I have a data that looks like this:

Expression  Species  timePoint  Replicate
40            A          T1       R1
60            A          T1       R2
48            A          T1       R3
52            A          T2       R1
58            A          T2       R2
64            A          T2       R3
39            B          T1       R1
48            B          T1       R2
54            B          T1       R3
448           B          T2       R1
490           B          T2       R2
378           B          T2       R3

Now, if I would like to check if there is expression difference between speciesA and speciesB between time points T1 and T2, then, I do:

require(MASS)
df <- data.frame( Expression=c(40,60,48,52,58,64,39,48,54,448,490,378), Species=c(rep("A",6), rep("B",6)), timePoint=rep(c(rep("T1",3), rep("T2",3)), 2), Replicate=rep(c("R1","R2","R3"),4), stringsAsFactors=T)
nb.fit <- glm.nb( Expression ~ Species * timePoint, data=df, control=glm.control(maxit=25, trace=T) )
summary(nb.fit)

Call:  
glm.nb(formula = Expression ~ Species * timePoint, data = df, 
control = glm.control(maxit = 25, trace = T), init.theta = 163.3237449, 
link = log)  

Deviance Residuals: 
 Min        1Q    Median        3Q       Max  
-1.57348  -0.78584   0.06399   0.71550   1.27660  

Coefficients:

                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)           3.89860    0.09380  41.565   <2e-16 ***
SpeciesB             -0.04845    0.13391  -0.362    0.717     
timePointT2           0.16184    0.12879   1.257    0.209     
SpeciesB:timePointT2  2.07175    0.16888  12.268   <2e-16 *** 

(Dispersion parameter for Negative Binomial(163.3237) family taken to be 1)
    Null deviance: 947.708  on 11  degrees of freedom
Residual deviance:  10.024  on  8  degrees of freedom
AIC: 102.06

Number of Fisher Scoring iterations: 1
              Theta:  163 
          Std. Err.:  138 
 2 x log-likelihood:  -92.06

Now, the estimate obtained can be computed by log( T2/T1 of B) - log( T2/T1 of A) as follows:

> meanVal <- c( t( sapply( split(df, df[,2:3] ), function(x) mean(x[,1] ) ) ) )
> estimate <- log( meanVal[4]/meanVal[2] ) - log( meanVal[3]/meanVal[1] )
> estimate
> [1] 2.071749

Until this I follow. However, from here, I would like to know these:
1) How is the standard error estimated?
3) And how is the fitting of negative binomial distribution influence the std. error, z-value or the p-value? I mean, where does the dispersion parameter calculated used?

I have read and tried to understand from quite a few tutorials and books. But I don't seem to understand. I would be really grateful if any of you could boil it down for me.

Thank you!

I have couple of questions: 1) which standard error do you mean? One of those for the coefficients or for the theta? 2) The same question, referring to which Estiamte and Std.error. 3) I don't understand what you mean by "how is the fitting of negative binomial distribution influence the std. error, z-value or the p-value", sorry. Please say what you mean by that. And the second part of 3) is this a general question on how to estimate the dispersion parameter (and do you mean the "Dispersion parameter for Negative Binomial(163.3237) family taken to be 1"? or the shape parameter theta) — Momo, May 04 '12 at 17:30
If you clarify it I might be able to help you out. That's why I ask. Don't want to appear rude, am not an English native speaker myself. — Momo, May 04 '12 at 17:31
How deep an explanation do you want? Do you know how the linear regression model is estimated? a generalized linear model such as logistic or Poisson regression are estimated? Since you don't even know that the z-value = estimate/standard error, and p-value is from a normal distribution, perhaps "some magic happens" is the best explanation for the standard error. — Aniko, May 04 '12 at 18:17
Momo, in the example above, under coefficients, there is a column "std. error" and it has a value of 0.16888 for speciesB:timePointT2. It would be great to know how to obtain that. — Arun, May 04 '12 at 21:59
Aniko, I am not a statistician. However, I know what a z-value is. I know how a p-value is obtained from z-value. I don't recall asking these questions. I don't understand much of glm yet. Standard error calculation in linear regression is straight-forward. What I don't understand is the question about the relation to dispersion parameter (value of 163.3237) to the standard error. Since you seem to understand things better, why not take a step to explain it to a non-statistician? If not, why bother to take the time to explain that "magic happens"? — Arun, May 04 '12 at 22:22
Related: @Gavin-Simpson s answer here http://stats.stackexchange.com/questions/70619/dispersion-in-summary-glm — Momo, Sep 21 '13 at 11:52

Momo · Accepted Answer · 2013-09-20T22:11:21.883

Thanks for clarifying. So, it appears you want to have the inner workings of GLM estimation explained. I can give a sketch, but I doubt it will help you much. It's probably better to read a book on GLM, e.g McCullagh and Nelder's book.

Anyway:

Question 1

The standard error for the $β_j$ in a GLM that uses Fischer scoring or IWLS (iteratively weighted least squares) gets calculated as:

The square roots of the diagonal elements of

$cov(\hat{β}) = \phi(X^T\hat{W}X)^{−1}$

in which $(X^T\hat{W}X)^{−1}$ is a by-product of the final IWLS iteration (the inverse of the estimated Fisher information). If $\phi$ is unknown, an estimate is required (as in quasi families). In glm.nb fitting this whole thing is actually achieved by fitting a negative binomial model with a fixed shape (or a Poisson in the initial fit) and then estimating the shape parameter iteratively and alternating both steps, and hence the standard error gets calculated as with glm(..., family=negbin(shape)) (Edit: The estimated shape parameter in your example is 163.32)

Question 2

Has already been explained. The $z$ value is a Wald test, which divides the estimate of $\beta_j$ by it's standard error (the diagonal element from above), i.e.

$z_j=\frac{\hat{\beta_j}}{\sqrt{\phi(X^T\hat{W}X)_{jj}^{−1}}} $

Question 3

I still don't understand the part about "And how is the fitting of negative binomial distribution influence the std. error, z-value or the p-value?"

But I think you would like to know where dispersion parameter comes from: The dispersion parameter $\phi$ here is simply fixed at 1 (because it is a Negative Binomial GLM with known shape parameter that is used in the second stage).

Hi Momo, this is great. Thanks a lot for the not-magic-happens answer. Now I understand the idea! At least I can think of a couple of questions now to ask and learn about. I just dint know of what to address in certain scenarios. I get the picture. Let me read thro' the concepts of glm once more now and will get back with more questions. Thanks a ton again! — Arun, May 05 '12 at 14:13
Momo, many thanks for linking to GavinSimpson's answer. And re-reading your answer here, I follow a bit more than before (I hope.. :)). A question: Why get to know "shape" so that dispersion can be 1, why not directly calculate dispersion? If one needs to get dispersion of the NB-model, how should one go about it? Also, this "shape" parameter, is it a result of taking NB as a gamma-poisson mixture? What I'd like to is to compute the mean and variance of the NB that's fit for this data (for which I'll have to get the dispersion).. right? Sorry if I seem to have gotten something wrong... — Arun, Sep 21 '13 at 12:46
If you estimate the dispersion it is no longer a proper likelihood but a quasi likelihood. The shape parameter is the second parameter of the negative binomial distribution, which in this formulation comes indeed from the mixture. Mean and variance are simple to compute: The mean is mu and the variance is mu+mu^2/theta. — Momo, Nov 25 '13 at 15:07

How does glm.nb work?

1 Answers1

Linked