anova() for lm() object returning inconsistent results

Question

Below is a brief snippet of R syntax. If you run it, you will see that model 2 v model 3 produces the same F ratio in each output. However, model 1 vs model 2 produces different F ratios. My understanding is that these should be the same. Any insights as to what is happening would be appreciated.

prob1 <- scan()
99.9   81.2   77.8   66.4
99.9   80.4   81.6   80.6
42.2   50.9   46.9   83.0
91.2   85.8   82.0   74.1
47.4   47.9   40.5   60.2
70.1   88.2   57.8   61.7
54.3   45.8   52.1   88.0
71.9   57.6   59.2   99.9
60.8   60.9   55.5   99.9
96.0   71.2   75.3   99.9
80.0   77.2   64.9   99.9
92.3   80.7   73.3   99.9
52.7   57.5   56.8   48.6

prob1 <- data.frame(t(array(prob1,dim=c(4,13))))
names(prob1) <- c("pchem1","mvcalc","pchem2","molbio")

lm.1 <- lm(pchem2 ~ pchem1, data=prob1)
lm.2 <- lm(pchem2 ~ pchem1 + mvcalc, data=prob1)
lm.3 <- lm(pchem2 ~ pchem1 + mvcalc + molbio, data=prob1)

anova(lm.1,lm.2)
anova(lm.2,lm.3)
anova(lm.1,lm.2,lm.3)

FOLLOW-UP #1:
The first anova() call is doing this calculation: $\frac{R_2^2-R_1^2}{1-R_2^2}·\frac{n-3}{1}$
The second anova() call is doing this calculation: $\frac{R_2^2-R_1^2}{1-R_3^2}·\frac{n-4}{1}$
If anyone can provide a reference for this, it would be appreciated. Additionally, if anyone could explain why the step-by-step single paired answer might be incorrect would be welcome. (I have taught it this way, and all the texts I've referenced indicate doing it this way.)

FOLLOW-UP #2:
SPSS (default) reports model-comparisons pair-wise...not in reference to the last model in the list.

Thank you for providing reproducible code. Could you please edit your question by changing `lm.1a` etc. to `model.1` etc, so we can match the example to your question? Right now, I see a comparison of "model 2" and "model 3" only in the third `anova` call, where they refer to `lm.1d` and `lm.1e`. Thank you! — Stephan Kolassa, Mar 16 '18 at 07:52
You should be able to edit your own posts. Can you please try again, by clicking on the "edit" link underneath your question? — Stephan Kolassa, Mar 16 '18 at 13:06
yes, I can edit it now...at the bottom is says "share cite edit delete flag" last night when I posted it, the only option showing was "share" — Gregg H, Mar 16 '18 at 14:32

Russ Lenth · Accepted Answer · 2018-03-19T02:00:57.010

After sleeping on this, I have come around, and have deleted my previous not-well-considered response. I believe that R is doing the wrong thing in that third anova() call. To elaborate further, consider what we get if we call anova() on just the third model:

> anova(lm.3)

Analysis of Variance Table

Response: pchem2
          Df  Sum Sq Mean Sq F value    Pr(>F)    
pchem1     1 2011.07 2011.07 98.0593 3.882e-06
mvcalc     1    2.91    2.91  0.1419    0.7151    
molbio     1    7.17    7.17  0.3497    0.5688    
Residuals  9  184.58   20.51

I would say that this output is meaningful from the viewpoint that we are given one model, and we are testing the contributions of each term as they are entered sequentially.

Now, consider:

> lm.0 <- lm(pchem2 ~ 1, data = prob1)
> anova(lm.0, lm.1, lm.2, lm.3)

Analysis of Variance Table

Model 1: pchem2 ~ 1
Model 2: pchem2 ~ pchem1
Model 3: pchem2 ~ pchem1 + mvcalc
Model 4: pchem2 ~ pchem1 + mvcalc + molbio

  Res.Df     RSS Df Sum of Sq       F    Pr(>F)    
1     12 2205.73                                   
2     11  194.66  1   2011.07 98.0593 3.882e-06
3     10  191.75  1      2.91  0.1419    0.7151    
4      9  184.58  1      7.17  0.3497    0.5688

These $F$ tests are identical to the previous ones. That is, we have the same answers, but different questions.

I think that the call anova(lm.0, lm.1, lm.2, lm.3) is asking a clear question: to compare those four models in sequence. As such, it should compare them pairwise; but instead, it looks ahead and uses the error term from the last model. I agree with the OP that that is wrong.

Again, the distinction I make here is that of sequentially testing the contributions of terms in one model, versus sequentially comparing a series of models. The R anova() function does the latter wrong.

Addendum

Moreover, I just noticed that the columns in the second anova() result are misleading in that they indicate that the residual terms and degrees of freedom differ in each row, when in fact the tests performed all use the largest model. (That this is done is documented, but certainly the labeling is misleading.)

> aov = anova(lm.0, lm.1, lm.2, lm.3)
> aov$Implied_F = aov[["Sum of Sq"]] / (aov[["RSS"]] / aov[["Res.Df"]])
> aov

Analysis of Variance Table

Model 1: pchem2 ~ 1
Model 2: pchem2 ~ pchem1
Model 3: pchem2 ~ pchem1 + mvcalc
Model 4: pchem2 ~ pchem1 + mvcalc + molbio

  Res.Df     RSS Df Sum of Sq       F  Pr(>F) Implied_F
1     12 2205.73                                       
2     11  194.66  1   2011.07 98.0593 0.00000   113.643
3     10  191.75  1      2.91  0.1419 0.71515     0.152
4      9  184.58  1      7.17  0.3497 0.56882     0.350

Addendum 2

Here's the really sad part. They originally had it right, but changed it in August, 2000. See https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=621, reported by none other than Brian Ripley. The stated reason is to make it do the same thing as S did. There is a mechanism for re-opening a bug, but this sure seems to be a huge uphill battle -- because of the people involved and the potential effect on users in the past 18 years.

regarding addendum: this was exactly my concern...if it is reporting the "best" model for denominator, all df denoms should be 9. The fact that this is hidden in the documentation as opposed to presented in a useable fashion in the output proved to be very misleading (and time consuming). — Gregg H, Mar 19 '18 at 01:01
@GreggH FWIW, I did submit a comment on that entry in the R bug-tracking system. — Russ Lenth, Mar 19 '18 at 02:47

anova() for lm() object returning inconsistent results

1 Answers1

Addendum

Addendum 2

Linked