2

I am trying to test the interaction term of level1 * question. I have two ways, using anova(lm) and anova(lm, lm2). Why their results are so different?

Code

pacman::p_load(nlme)
data_long = read.csv("https://raw.githubusercontent.com/slfan2013/Jen-Falbe---menu-labeling---Dec-2020/main/STATA/data_long.csv?token=ADGNCRVI2JVHC72URESWLFS765ALK")[,-1]
lm = lme(pme ~ level1 * question, random = ~1 | subject, data = data_long)
lm2 = lme(pme ~ level1 + question, random = ~1 | subject, data = data_long)

Results

> anova(lm)
                numDF denDF   F-value p-value
(Intercept)         1  2648 21598.329  <.0001
level1              2  1324     8.981  0.0001
question            2  2648    17.354  <.0001
level1:question     4  2648     0.857  0.4888



> anova(lm, lm2)
    Model df      AIC      BIC    logLik   Test  L.Ratio p-value
lm      1 11 11853.99 11923.15 -5915.995                        
lm2     2  7 11838.61 11882.62 -5912.303 1 vs 2 7.383951  0.1169

I don't understand why the interaction p-value from the first method is 0.4888 and the second is 0.1169, which is too different.

WCMC
  • 739
  • 1
  • 10
  • 27
  • Why would you assume they to be the same? They measure different things. – Tim Dec 31 '20 at 18:08
  • I thought `anova(lm, lm2)` is likelihood ratio test for interaction term and `anova(lm)` is Wald test for interaction term? – WCMC Dec 31 '20 at 18:10
  • `anova(lm, lm2)` is a likelihood ratio test that compares *two models*, the second gives you tests for the individual effects. – Tim Dec 31 '20 at 18:43
  • But their interpretation should all focus on the interaction term, and they should not be that different, especially given the not-small sample size in the data. – WCMC Dec 31 '20 at 18:45
  • You could do a simulation study to see what happens when (1) level1 and question are independent, and (2) they are (highly) correlated. From that, I guess you have some insights into your question. – TrungDung Dec 31 '20 at 19:14
  • That is too complicated. I expect there should be simple interpretations of such difference, i.e. coding error or missing any argument in some r functions. By the way, level1 and question ARE independent. – WCMC Dec 31 '20 at 19:16
  • Does this answer your question? [What test exactly does lm.anova perform in R?](https://stats.stackexchange.com/questions/454467/what-test-exactly-does-lm-anova-perform-in-r) – Karolis Koncevičius Dec 31 '20 at 21:25
  • @KarolisKoncevičius No. Based on that question, I should get exactly same result. – WCMC Dec 31 '20 at 21:39

1 Answers1

3

The default optimization criterion of lme is restricted maximum likelihood (REML). REML is not compatible with likelihood ratio tests. You need to refit your models with maximum likelihood, method = "ML", to compare them on the basis of a likelihood ratio test. Doing so, we see more agreement with the F test.

lm <- lme(pme ~ level1 * question, random = ~1 | subject, data = data_long, method = "ML")
lm2 <- lme(pme ~ level1 + question, random = ~1 | subject, data = data_long, method = "ML")

anova(lm, lm2)

The output is

    Model df      AIC      BIC    logLik   Test  L.Ratio p-value
lm      1 11 11822.01 11891.20 -5900.007                        
lm2     2  7 11817.45 11861.47 -5901.725 1 vs 2 3.435362  0.4878

If you do not use maximum likelihood to fit your mixed effects models, anova(lm, lm2) will complain with a warning that the comparisons are not meaningful!

JTH
  • 1,003
  • 7
  • 14