0

For example, using the mtcars built-in dataset.

A two-model ANOVA:

anova(
  lm(mpg ~ 1, data = mtcars),
  lm(mpg ~ drat, data = mtcars)
)
Model 1: mpg ~ 1
Model 2: mpg ~ drat
  Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
1     31 1126.05                                 
2     30  603.57  1    522.48 25.97 1.776e-05 ***

A three-model ANOVA:

anova(
  lm(mpg ~ 1, data = mtcars),
  lm(mpg ~ drat, data = mtcars),
  lm(mpg ~ drat + wt, data = mtcars)
)
Model 1: mpg ~ 1
Model 2: mpg ~ drat
Model 3: mpg ~ drat + wt
  Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
1     31 1126.05                                  
2     30  603.57  1    522.48 56.276 2.871e-08 ***
3     29  269.24  1    334.33 36.010 1.589e-06 ***

Now, looking at row 2 of each table, you will notice that sum of squares and degrees of freedom are the same as each other, but the $f$ statistic is 25.97 in the first test and 56.276 in the second. Why is this? How is $f$ calculated in both cases?

My understanding of the $f$ statistic is that it's the ratio of two chi-squared RVs, ie $f = \frac{ESS(\gamma_j \mid \gamma_{j-1})/r}{RSS_j/(n-p)}$ (where $r$ is the difference in number of parameters between the two models). However all of these components are the same between the two tests, so I don't understand why $f$ would differ.

Migwell
  • 273
  • 2
  • 11
  • The DFs relate to the difference between consecutive models, while the SS column is about the residual squares of the model under consideration (and it remains the same as expected). The `anova` (or `drop1` or whatever) are about comparing [nested models](https://stats.stackexchange.com/a/8247/930). – chl Nov 03 '20 at 12:18
  • Yes I realise that, but both of the DF columns are the same for row 2 in both tables. I'm not interested in row 3. – Migwell Nov 03 '20 at 12:30
  • That one 1 DF for the added predictor between the base model (`mpg ~ 1`, row 1) and the next ones, which both include numerical predictors (1 regression coeff each). – chl Nov 03 '20 at 13:06

0 Answers0