5

I'm following up on this great answer. Suppose I have a cross-level interaction in my mixed-effects model below (ses*sector). My cluster-level predictor "sector" is a binary variable (0=Public, 1=Private). My level-1 predictor ses is numeric.

QUESTION: How should I define my model so that Corr, the correlation between intercepts and slopes, be separately reported for public & private sectors?

Also, what is the difference between the following models in terms of their random-effects structure?

library(lme4)
hsb <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')

original <- summary(lmer(math ~ ses*sector + (ses|sch.id), data = hsb))

     two <- summary(lmer(math ~ ses*sector + (ses*sector|sch.id), data = hsb))
   three <- summary(lmer(math ~ ses*sector + (ses:sector|sch.id), data = hsb))
    four <- summary(lmer(math ~ ses*sector + (ses+sector|sch.id), data = hsb))
Robert Long
  • 53,316
  • 10
  • 84
  • 148
rnorouzian
  • 3,056
  • 2
  • 16
  • 40

2 Answers2

5

First:

what is the difference between the following models in terms of their random-effects structure?

the model:

lmer(math ~ ses*sector + (ses+sector|sch.id), data = hsb)

specifies:

  • random intercepts for sch.id
  • random slopes for ses and sector, both varying within sch.id
  • the 3 correlations between each of those.

The model:

lmer(math ~ ses*sector + (ses*sector|sch.id), data = hsb)

specifies:

  • random intercepts for sch.id
  • random slopes for ses, sector and the interaction between them, all varying within sch.id
  • the 6 correlations between each of those.

The model:

lmer(math ~ ses*sector + (ses:sector|sch.id), data = hsb)

specifies:

  • random intercepts for sch.id
  • random slopes for the interaction between ses and sector - that is, a slope for each unique combination of ses and sector - varying within sch.id. Note that since sector is categorical with 2 levels, this will result in 2 random slopes estimates.
  • the 3 correlations between each of those. The correlation between each of the random slopes and the intercept should answer the next question:

How should I define my model so that Corr, the correlation between intercepts and slopes, be separately reported for public & private sectors?

Edit:

So, it appears that

 lmer(math ~ ses*sector + (ses:sector|sch.id), data = hsb)

will achieve this.

Here is a monte carlo simulation to demonstrate it:

We simulate two datasets, one for each sector, with 20 schools in each.

We set the standard deviations of the random intercepts and slopes to be the same among all schools, but the correlation between them to be different. Here we use -0.4 and 0.4 but the code can easily be adjusted:

First we set up a helper function to let us know if the models have converged normally and if not, why:

# helper function
# Has the model converged ?

hasConverged <- function (mm) {
  
  if ( !class(mm)[1] == "lmerMod") stop("Error must pass a lmerMod object")
  
  retval <- NULL
  
  if(is.null(unlist(mm@optinfo$conv$lme4))) {
    retval = 1
  }
  else {
    if (isSingular(m0)) {
      retval = 0
    } else {
      retval = -1
    }
  }
  return(retval)
}

Now we set up the parameters and data needed:

n.school <- 20

dt0 <- expand.grid(sch.id = 1:n.school, ses = 1:10, sector = c("0"))
dt1 <- expand.grid(sch.id = (n.school+1):(n.school*2), ses = 1:10, sector = c("1"))
dt0$Y <- 1
X <- model.matrix( ~ ses, data = dt0)

myFormula <- "Y ~ ses + (ses | sch.id)"

foo <- lFormula(eval(myFormula), dt0)
Z <- t(as.matrix(foo$reTrms$Zt))

betas_0 <- c(1, 2)        
betas_1 <- c(1, 3)  # use different fixed effect for ses to create an interaction in the combined model

s1 <- 2 #  SD of random intercepts
s2 <- 1 #  SD of random slopes

rho_0 <- 0.4  # correlation between intercepts and slopes for sector 0
cormat_0 <-  matrix(c(s1, rho_0, rho_0, s2), 2, 2)  # correlation matrix for sector 0
covmat_0 <- lme4::sdcor2cov(cormat_0)    # covariance matrix (needed for mvrnorm)


rho_1 <- -0.4 # correlation between intercepts and slopes for sector 1
cormat_1 <-  matrix(c(s1, rho_1, rho_1, s2), 2, 2)  # correlation matrix for sector 1
covmat_1 <- lme4::sdcor2cov(cormat_1)    # covariance matrix (needed for mvrnorm)

then perform the simulations:

n.sim <- 250
vec.sim.rho_0.sep <- numeric(n.sim)
vec.sim.rho_1.sep <- numeric(n.sim)
vec.sim.rho_0.comb <- numeric(n.sim)
vec.sim.rho_1.comb <- numeric(n.sim)

for (i in 1:n.sim) {

  set.seed(i)

  # simulate the random effects
  
  umat_0 <- MASS::mvrnorm(n.school, c(0, 0), covmat_0, empirical = TRUE)  
  u_0 <- c(rbind(umat_0[, 1], umat_0[, 2]))  # lme4 needs the random effects in this order interleaved)

  e_0 <- rnorm(nrow(dt0), 0, 2)   # residual error

  dt0$Y <- X %*% betas_0 + Z %*% u_0 + e_0

  # fit the model for sector 0
  m0 <- lmer(myFormula, dt0)

  # simulate random effects for sector 1
  umat_1 <- MASS::mvrnorm(n.school, c(0, 0), covmat_1, empirical = TRUE)  

  u_1 <- c(rbind(umat_1[, 1], umat_1[, 2]))  # lme4 needs the random effects in this order interleaved)

  e_1 <- rnorm(nrow(dt1), 0, 2)   # residual error

  dt1$Y <- X %*% betas_1 + Z %*% u_1 + e_1

  # fit the model for sector 1
  m1 <- lmer(myFormula, dt1)

  # combine data:
  dt <- rbind(dt0, dt1)

  # fit the full model
  m <- lmer(Y ~ ses*sector + (ses:sector | sch.id), dt)

  If all models converged properly then extract the correlations
  if (hasConverged(m0) == 1 && hasConverged(m1) == 1 && hasConverged(m) == 1) {
    print(paste(i, "OK", sep = ": "))
    vc_0 <- as.data.frame(VarCorr(m0))
    vc_1 <- as.data.frame(VarCorr(m1))
    vc <- as.data.frame(VarCorr(m))
  
    vec.sim.rho_0.sep[i] <- vc_0$sdcor[3]
vec.sim.rho_1.sep[i] <- vc_1$sdcor[3]
    vec.sim.rho_0.comb[i] <- vc$sdcor[4]
vec.sim.rho_1.comb[i] <- vc$sdcor[5]
  
  } else {
    print(paste(i, "Not converged", sep = ": "))
    vec.sim.rho_0.sep[i] <- NA
    vec.sim.rho_1.sep[i] <- NA
    vec.sim.rho_0.comb[i] <- NA
    vec.sim.rho_1.comb[i] <- NA
  }
}

The first two vectors, vec.sim.rho_0.sep and vec.sim.rho_1.sep are the estimated correlations from the models run on the seperate datasets.

vec.sim.rho_0.comb and vec.sim.rho_1.comb are the estimated correlations from the models run on the full dataset

> mean(vec.sim.rho_0.sep, na.rm = TRUE) ; mean(vec.sim.rho_0.comb, na.rm = TRUE)
[1] 0.4355072
[1] 0.4320562
> mean(vec.sim.rho_1.sep, na.rm = TRUE) ; mean(vec.sim.rho_1.comb, na.rm = TRUE)
[1] -0.4015803
[1] -0.4039854
Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • 1
    If you were fitting a random slope for another level 1 variable (e.g., SES), you could investigate sector heteroskedasticity at level 2 (differing random intercept, slope, and covariances by sector). But as you say, I don't see how you can do it given the presented models without fitting two separate models. This dataset is fairly large so it might work. – Erik Ruzek Sep 25 '20 at 19:41
  • 1
    @ErikRuzek agreed. I've done some simulations where I created random effects where the correlations were quite distinct between the two groups (say -0.5 and 0.5). I couldn't find a way to recover these with a model on the whole dataset, but with seperate models it was obviously OK. I think this is an interesting and practical question. – Robert Long Sep 25 '20 at 19:46
  • 1
    Robert, just in case interested [HERE](https://stats.stackexchange.com/questions/489227/degenerate-or-singular-covariance-matrices-in-mixed-effects-model-and-correlatio) is another question. – rnorouzian Sep 26 '20 at 06:38
  • 1
    @ErikRuzek I actually found a way to do it, using just the interaction term for the random slopes. I've added code. – Robert Long Sep 26 '20 at 12:31
  • Rob, all the models and simulation above are taking the cross-level interaction as being random and thus varying across level of the `sch.id`. But as you noted elsewhere, we can't take a cross-level interaction varying across level of the a grouping variable. Am I missing something? – rnorouzian Oct 08 '20 at 05:07
  • I don't think I said the interaction couldn't vary. I think it can. If I did say that then I may have been mistaken. It's the group level predictor that can't. – Robert Long Oct 08 '20 at 08:03
  • I'm coming back to this question after a while. But given that `sector` is a level-2 predictor that doesn't vary in `id`, I think the following models are partly incorrect, right? `lmer(math ~ ses*sector + (ses+sector|sch.id), data = hsb) ; lmer(math ~ ses*sector + (ses*sector|sch.id), data = hsb)` – rnorouzian Jul 22 '21 at 21:08
1

This can probably be done with a dummy variable, i.e.

~ ses*sector + (dummy(sector,"Public"):ses|sch.id)
             + (dummy(sector,"Private"):ses|sch.id)

the dummy() function will return a variable that is 1 for the focal level and 0 otherwise (you could alternately just create a variable public_dummy <- as.numeric(sector=="Public"). Interactions with a numeric covariate will multiply the predicted response by the numeric value, so using a binary variable will cancel out the corresponding terms in the model.

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
  • 1
    This doesn't seem to work well for me. If I add this method to the simulations which I just edited into the end of my answer, I get 0.62 and -0.57 which is quite a bit different from the 0.4 and -0.4 that I used to simulate the data with. – Robert Long Sep 26 '20 at 08:42