11

I'm trying to reproduce several interaction test between with both lm and lmer on repeated measures (2x2x2). The reason I want to compare both methods is because SPSS's GLM for repeated measures yields the exact same results as the lm approach presented here, so at the end I want to compare SPSS vs R-lmer. So far, I've only managed to reproduce (closely) some of these interactions.

You'll find below a script to better illustrate my point:

library(data.table)
library(tidyr)
library(lmerTest)
library(MASS)

set.seed(1)

N     <- 100 # number of subjects
sigma <- 1   # popuplation sd
rho   <- .6  # correlation between variables

# X1:   a  a  a  a  b  b  b  b
# X2:   a  a  b  b  a  a  b  b
# X3:   a  b  a  b  a  b  a  b
mu <- c(5, 3, 3, 5, 3, 5, 5, 3) # means

# Simulate the data
sigma.mat <- rep(sigma, length(mu))
S <- matrix(sigma.mat, ncol = length(sigma.mat), nrow = length(sigma.mat))
Sigma <- t(S) * S * rho  
diag(Sigma) <- sigma**2
X <- data.table( mvrnorm(N, mu, Sigma) )
setnames(X, names(X), c("aaa", "aab", "aba", "abb", "baa", "bab", "bba", "bbb"))
X[, id := 1:.N]

# Long format
XL <- data.table( gather(X, key, Y, aaa:bbb) )
XL[, X1 := substr(key, 1, 1)]
XL[, X2 := substr(key, 2, 2)]
XL[, X3 := substr(key, 3, 3)]

# Recode long format (a = +1; b = -1)
XL[, X1c := ifelse(X1 == "a", +1, -1)]
XL[, X2c := ifelse(X2 == "a", +1, -1)]
XL[, X3c := ifelse(X3 == "a", +1, -1)]


### Composite scores to be used with lm
# X2:X3 2-way interaction (for half the data; i.e. when X1 == "a")
X[, X1a_X2.X3 := (aaa - aab) - (aba - abb)]

# X2:X3 2-way interaction (for all the data)
X[, aa := (aaa + baa) / 2]
X[, ab := (aab + bab) / 2]
X[, ba := (aba + bba) / 2]
X[, bb := (abb + bbb) / 2]
X[, X2.X3 := (aa - ab) - (ba - bb)]

# X1:X2:X3 3-way interaction (for all the data)
X[, X1.X2.X3 := ( (aaa - aab) - (aba - abb) ) - ( (baa - bab) - (bba - bbb) )]


### Fit models
# X2:X3 2-way interaction (for half the data; i.e. when X1 == "a")
summary( lm(X1a_X2.X3 ~ 1, X) ) # t = 34.13303
summary( lmer(Y ~ X2c*X3c + (X2c+X3c|id), XL[X1 == "a"]) ) # t = 34.132846  close match
summary( lmer(Y ~ X2c*X3c + (X2c+X3c||id), XL[X1 == "a"]) ) # t = 34.134624  close match

# X2:X3 2-way interaction (for all the data) 
summary( lm(X2.X3 ~ 1, X) ) # t = 0.3075025
summary( lmer(Y ~ X2c*X3c + (X2c+X3c|id), XL) ) # t = 0.1641932
summary( lmer(Y ~ X2c*X3c + (X2c+X3c||id), XL) ) # t = 0.1640710
summary( lmer(Y ~ X2c*X3c + (X2c*X3c|id), XL) ) # t = 0.1641765
anova(   lmer(Y ~ X2c*X3c + (X2c*X3c|id), XL), ddf = "Kenward-Roger" ) # t = 0.1643168
summary( lmer(Y ~ X2c*X3c + (X2c*X3c|id), XL, REML = FALSE) ) # t = 0.1645303
summary( lmer(Y ~ X2c*X3c + (X2c*X3c||id), XL) ) # t = 0.1640704

# X1:X2:X3 3-way interaction (for all the data)
summary( lm(X1.X2.X3 ~ 1, X) ) # t = 46.50177
summary( lmer(Y ~ X1c*X2c*X3c + (X1c*X2c*X3c - X1c:X2c:X3c|id), XL) ) # t = 49.0317599
anova(   lmer(Y ~ X1c*X2c*X3c + (X1c*X2c*X3c - X1c:X2c:X3c|id), XL), ddf = "Kenward-Roger" ) # t = 49.03176
summary( lmer(Y ~ X1c*X2c*X3c + (X1c*X2c*X3c - X1c:X2c:X3c|id), XL, REML = FALSE) ) # t = 49.2677606
summary( lmer(Y ~ X1c*X2c*X3c + (X1c*X2c*X3c - X1c:X2c:X3c||id), XL) ) # t = 46.5193774 close match
summary( lmer(Y ~ X1c*X2c*X3c + (X1c*X2c*X3c|id), XL) ) # unidentifiable
summary( lmer(Y ~ X1c*X2c*X3c + (X1c*X2c*X3c|id), XL,
              control = lmerControl(check.nobs.vs.nRE="ignore")) ) # t = 46.5148684 close match

As you can see from above, none of the lm estimations are exactly matched by the lmer ones. Although some of the results are very similar and may differ only due because of numerical/computational reasons. The gap between both estimation method is specially large for the X2:X3 2-way interaction (for all the data).

My question is if there's a way to obtain the exact same results with both methods, and if there is a correct way to perform the analyses with lmer (although it may not match the lm results).


Bonus:

I've noticed that the t value associated with the 3-way interaction is affected by the way factors are coded, which seems very strange to me:

summary( lmer(Y ~ X1*X2*X3 + (X1*X2*X3 - X1:X2:X3||id), XL) ) # t = 48.36
summary( lmer(Y ~ X1c*X2c*X3c + (X1c*X2c*X3c - X1c:X2c:X3c||id), XL) ) # t = 56.52
mat
  • 639
  • 1
  • 4
  • 19
  • 1
    +1 because it looks interesting but I have no idea what you are doing here :) Can you explain in words or math why these lm and lmer calls should yield the same coefficients? And what's the logic behind this whole exercise? – amoeba Apr 15 '19 at 13:15
  • @amoeba I updated my post to clarify the purpose of this post. Basically, I want to reproduce the results from SPSS (which can be translated into an `lm` model) with `lmer`, and also know what is the *correct* `lmer` analyses for this kind of data. – mat Apr 15 '19 at 14:39
  • The reason for the large discrepancy in case of the two-way interaction for the full data is that you have 2 data points per parameter combination. The intuition is that the effective sample size for a mixed model is 2x smaller than for `lm`; I suspect that's why the t-statistic is roughly two times smaller in `lmer`. You would probably be able to observe the same phenomenon using a simpler 2x2 design and looking at the main effects, without bothering with 2x2x2 and complicated interactions. – amoeba Apr 15 '19 at 22:51

1 Answers1

3

Strange, when I use your last model, I find a perfect match, not a close match:

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  3.91221    0.07242 99.00001  54.025   <2e-16 ***
X1c          0.03277    0.05006 99.00000   0.655    0.514    
X2c         -0.04836    0.04644 99.00000  -1.042    0.300    
X3c          0.04248    0.05009 99.00001   0.848    0.398    
X1c:X2c      0.08370    0.08747 98.99998   0.957    0.341    
X1c:X3c     -0.07025    0.08895 98.99994  -0.790    0.432    
X2c:X3c     -0.02957    0.09616 99.00000  -0.308    0.759    
X1c:X2c:X3c -8.14099    0.17507 99.00003 -46.502   <2e-16 ***
amoeba
  • 93,463
  • 28
  • 275
  • 317
user244839
  • 31
  • 1
  • 1
    Just to be clear, which model are you referring to? – mat Apr 15 '19 at 15:36
  • summary( lmer(Y ~ X1c*X2c*X3c + (X1c*X2c*X3c|id), XL, control = lmerControl(check.nobs.vs.nRE="ignore")) ) – user244839 Apr 16 '19 at 07:58
  • This is indeed very strange! `summary( lmer(Y ~ X1c*X2c*X3c + (X1c*X2c*X3c|id), XL, control=lmerControl(check.nobs.vs.nRE="ignore")) )$coefficients` returns `t = 46.5148684` for me. Might be a version issue? I'm using `R version 3.5.3 (2019-03-11)` and `lmerTest 3.1-0`. – mat Apr 16 '19 at 09:17
  • I have the same R & lmerTest versions as @mat and get the same results as them (albeit with many warnings - failure to converge, etc). – mkt Apr 18 '19 at 08:27
  • @mkt this is so strange! I even tried from a different PC and I still get these results. How can this be possible? Which version of `lme4` do you have? I've `lme4 1.1-21` – mat Apr 19 '19 at 09:58
  • 1
    @mat Perhaps I wasn't clear - I'm getting the same results as you! I think you're probably right that user244839 is using a different version than we are. – mkt Apr 19 '19 at 13:16