Suppose we consider the InsectSprays data frame in R, and we create a linear model lm(count ~ spray, data = InsectSprays
. Looking at the regression table, we get standard errors and t-values for each possible level of the spray
values. I'm wondering how these are computed.
I considered an $F$-test comparing a model fit without a specific level against the model with all levels, but I didn't get the same values.
Here is an example where I tried removing the sprayC
level and comparing the models; sprayA is used as the baseline level for each.
lm.full<- lm(count ~ I(1*(spray == 'B')) +
I(1*(spray == 'C')) + I(1*(spray == 'D')) +
I(1*(spray == 'E')) + I(1*(spray == 'F')),
data = InsectSprays)
lm.noC<-lm(count ~I(1*(spray == 'B')) + I(1*(spray == 'D')) +
I(1*(spray == 'E')) + I(1*(spray == 'F')),
data = InsectSprays)
summary(lm.full)$coef
anova(lm.noC, lm.full)$F
I also tried comparing the test-statistic values to a two-sample t-test of sprayC against sprayA, and this also didn't yield the same values:
sprayAcounts<-InsectSprays[InsectSprays$spray=="A","count"]
sprayBcounts<-InsectSprays[InsectSprays$spray=="C","count"]
t.test(sprayAcounts, sprayBcounts, var.equal = TRUE)
How are these test-statistics and standard errors computed?