0

Suppose we consider the InsectSprays data frame in R, and we create a linear model lm(count ~ spray, data = InsectSprays. Looking at the regression table, we get standard errors and t-values for each possible level of the spray values. I'm wondering how these are computed.

I considered an $F$-test comparing a model fit without a specific level against the model with all levels, but I didn't get the same values.

Here is an example where I tried removing the sprayC level and comparing the models; sprayA is used as the baseline level for each.

lm.full<- lm(count ~ I(1*(spray == 'B')) +
               I(1*(spray == 'C')) + I(1*(spray == 'D')) +
               I(1*(spray == 'E')) + I(1*(spray == 'F')),
           data = InsectSprays)


lm.noC<-lm(count ~I(1*(spray == 'B')) + I(1*(spray == 'D')) +
               I(1*(spray == 'E')) + I(1*(spray == 'F')),
           data = InsectSprays)

summary(lm.full)$coef
anova(lm.noC, lm.full)$F

I also tried comparing the test-statistic values to a two-sample t-test of sprayC against sprayA, and this also didn't yield the same values:

sprayAcounts<-InsectSprays[InsectSprays$spray=="A","count"]
sprayBcounts<-InsectSprays[InsectSprays$spray=="C","count"]

t.test(sprayAcounts, sprayBcounts, var.equal = TRUE)

How are these test-statistics and standard errors computed?

  • I'm most interested in the standard errors; after some further thought, I understand where the test statistics come from. – stats_curious Jun 30 '20 at 12:23
  • With more digging, I found this related post [https://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression](https://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression) – stats_curious Jun 30 '20 at 12:24

0 Answers0