I have been working with glm.nb
from MASS
package for quite a while now. However, there are somethings I seem to not quite able to get my head around. Suppose I have a data that looks like this:
Expression Species timePoint Replicate 40 A T1 R1 60 A T1 R2 48 A T1 R3 52 A T2 R1 58 A T2 R2 64 A T2 R3 39 B T1 R1 48 B T1 R2 54 B T1 R3 448 B T2 R1 490 B T2 R2 378 B T2 R3
Now, if I would like to check if there is expression difference between speciesA
and speciesB
between time points T1
and T2
, then, I do:
require(MASS)
df <- data.frame( Expression=c(40,60,48,52,58,64,39,48,54,448,490,378), Species=c(rep("A",6), rep("B",6)), timePoint=rep(c(rep("T1",3), rep("T2",3)), 2), Replicate=rep(c("R1","R2","R3"),4), stringsAsFactors=T)
nb.fit <- glm.nb( Expression ~ Species * timePoint, data=df, control=glm.control(maxit=25, trace=T) )
summary(nb.fit)
Call:
glm.nb(formula = Expression ~ Species * timePoint, data = df,
control = glm.control(maxit = 25, trace = T), init.theta = 163.3237449,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.57348 -0.78584 0.06399 0.71550 1.27660
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.89860 0.09380 41.565 <2e-16 ***
SpeciesB -0.04845 0.13391 -0.362 0.717
timePointT2 0.16184 0.12879 1.257 0.209
SpeciesB:timePointT2 2.07175 0.16888 12.268 <2e-16 ***
(Dispersion parameter for Negative Binomial(163.3237) family taken to be 1)
Null deviance: 947.708 on 11 degrees of freedom
Residual deviance: 10.024 on 8 degrees of freedom
AIC: 102.06
Number of Fisher Scoring iterations: 1
Theta: 163
Std. Err.: 138
2 x log-likelihood: -92.06
Now, the estimate
obtained can be computed by log( T2/T1 of B) - log( T2/T1 of A) as follows:
> meanVal <- c( t( sapply( split(df, df[,2:3] ), function(x) mean(x[,1] ) ) ) )
> estimate <- log( meanVal[4]/meanVal[2] ) - log( meanVal[3]/meanVal[1] )
> estimate
> [1] 2.071749
Until this I follow. However, from here, I would like to know these:
1) How is the standard error estimated?
3) And how is the fitting of negative binomial distribution influence the std. error, z-value or the p-value? I mean, where does the dispersion parameter
calculated used?
I have read and tried to understand from quite a few tutorials and books. But I don't seem to understand. I would be really grateful if any of you could boil it down for me.
Thank you!