R: anova(lm) anova(lm, lm2) interaction term result different?

Question

I am trying to test the interaction term of level1 * question. I have two ways, using anova(lm) and anova(lm, lm2). Why their results are so different?

Code

pacman::p_load(nlme)
data_long = read.csv("https://raw.githubusercontent.com/slfan2013/Jen-Falbe---menu-labeling---Dec-2020/main/STATA/data_long.csv?token=ADGNCRVI2JVHC72URESWLFS765ALK")[,-1]
lm = lme(pme ~ level1 * question, random = ~1 | subject, data = data_long)
lm2 = lme(pme ~ level1 + question, random = ~1 | subject, data = data_long)

Results

> anova(lm)
                numDF denDF   F-value p-value
(Intercept)         1  2648 21598.329  <.0001
level1              2  1324     8.981  0.0001
question            2  2648    17.354  <.0001
level1:question     4  2648     0.857  0.4888



> anova(lm, lm2)
    Model df      AIC      BIC    logLik   Test  L.Ratio p-value
lm      1 11 11853.99 11923.15 -5915.995                        
lm2     2  7 11838.61 11882.62 -5912.303 1 vs 2 7.383951  0.1169

I don't understand why the interaction p-value from the first method is 0.4888 and the second is 0.1169, which is too different.

Why would you assume they to be the same? They measure different things. — Tim, Dec 31 '20 at 18:08
I thought `anova(lm, lm2)` is likelihood ratio test for interaction term and `anova(lm)` is Wald test for interaction term? — WCMC, Dec 31 '20 at 18:10
`anova(lm, lm2)` is a likelihood ratio test that compares *two models*, the second gives you tests for the individual effects. — Tim, Dec 31 '20 at 18:43
But their interpretation should all focus on the interaction term, and they should not be that different, especially given the not-small sample size in the data. — WCMC, Dec 31 '20 at 18:45
You could do a simulation study to see what happens when (1) level1 and question are independent, and (2) they are (highly) correlated. From that, I guess you have some insights into your question. — TrungDung, Dec 31 '20 at 19:14
That is too complicated. I expect there should be simple interpretations of such difference, i.e. coding error or missing any argument in some r functions. By the way, level1 and question ARE independent. — WCMC, Dec 31 '20 at 19:16
Does this answer your question? [What test exactly does lm.anova perform in R?](https://stats.stackexchange.com/questions/454467/what-test-exactly-does-lm-anova-perform-in-r) — Karolis Koncevičius, Dec 31 '20 at 21:25
@KarolisKoncevičius No. Based on that question, I should get exactly same result. — WCMC, Dec 31 '20 at 21:39

JTH · Accepted Answer · 2020-12-31T22:28:06.677

The default optimization criterion of lme is restricted maximum likelihood (REML). REML is not compatible with likelihood ratio tests. You need to refit your models with maximum likelihood, method = "ML", to compare them on the basis of a likelihood ratio test. Doing so, we see more agreement with the F test.

lm <- lme(pme ~ level1 * question, random = ~1 | subject, data = data_long, method = "ML")
lm2 <- lme(pme ~ level1 + question, random = ~1 | subject, data = data_long, method = "ML")

anova(lm, lm2)

The output is

    Model df      AIC      BIC    logLik   Test  L.Ratio p-value
lm      1 11 11822.01 11891.20 -5900.007                        
lm2     2  7 11817.45 11861.47 -5901.725 1 vs 2 3.435362  0.4878

If you do not use maximum likelihood to fit your mixed effects models, anova(lm, lm2) will complain with a warning that the comparisons are not meaningful!

R: anova(lm) anova(lm, lm2) interaction term result different?

1 Answers1