Interpreting nested random effects

Question

I was playing around with some data and had hard time to understand the meaning of nested effects.

Here's an example of a dataset (selfesteem2 from package datarium)

dat <- structure(list(id = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L), .Label = c("1", "2", "3", "4", "5", "6", 
"7", "8", "9", "10", "11", "12"), class = "factor"), treatment = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ctr", "Diet"), class = "factor"), 
    time = c("t1", "t1", "t1", "t1", "t1", "t1", "t1", "t1", 
    "t1", "t1", "t1", "t1", "t1", "t1", "t1", "t1", "t1", "t1", 
    "t1", "t1", "t1", "t1", "t1", "t1", "t2", "t2", "t2", "t2", 
    "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", 
    "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", 
    "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", 
    "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", 
    "t3", "t3", "t3", "t3"), val = c(83, 97, 93, 92, 77, 72, 
    92, 92, 95, 92, 92, 79, 84, 100, 91, 91, 74, 76, 90, 89, 
    93, 90, 93, 80, 77, 95, 92, 92, 73, 65, 89, 87, 91, 84, 92, 
    69, 86, 99, 91, 92, 76, 75, 87, 89, 94, 93, 92, 80, 69, 88, 
    89, 89, 68, 63, 79, 81, 84, 81, 91, 62, 88, 97, 92, 95, 72, 
    76, 87, 88, 93, 95, 91, 78)), row.names = c(NA, -72L), class = c("tbl_df", 
"tbl", "data.frame"))

 dat %>% arrange(id)
# A tibble: 72 x 4
   id    treatment time    val
   <fct> <fct>     <chr> <dbl>
 1 1     ctr       t1       83
 2 1     Diet      t1       84
 3 1     ctr       t2       77
 4 1     Diet      t2       86
 5 1     ctr       t3       69
 6 1     Diet      t3       88
 7 2     ctr       t1       97
 8 2     Diet      t1      100
 9 2     ctr       t2       95
10 2     Diet      t2       99
# ... with 62 more rows

dat$id %>% unique
 [1] 1  2  3  4  5  6  7  8  9  10 11 12
Levels: 1 2 3 4 5 6 7 8 9 10 11 12
> dat$treatment %>% unique
[1] ctr  Diet
Levels: ctr Diet
> dat$time %>% unique
[1] "t1" "t2" "t3"

This is a repeated measures design, meaning that every participant (id) has gone through treatment-ctr and treatment-Diet, in all three time points (t1, t2, t3).

If I were to analyze this within the mixed models framework, I would do:

library(lme4)
mod1 <- lmer(val ~ treatment*time + (1|id), data = dat) %>% anova
mod1
Analysis of Variance Table
               npar Sum Sq Mean Sq F value
treatment         1 316.68  316.68  41.037
time              2 258.69  129.35  16.762
treatment:time    2 266.36  133.18  17.258

If I am right, this model analyses the main effects and interaction of treatment and time, while controlling for the fact that data points are not independent (same participants should have more similar results in various design groups than different participants).

Let's say we specify two further models:

> mod2 <- lmer(val ~ treatment*time + (1|treatment:id), data = dat) %>% anova
mod2
Analysis of Variance Table
               npar  Sum Sq Mean Sq F value
treatment         1   6.518   6.518   1.432
time              2 258.694 129.347  28.417
treatment:time    2 266.361 133.181  29.259

> mod3 <- lmer(val ~ treatment*time + (1|id) + (1|treatment:id), data = dat) %>% anova
mod3
Analysis of Variance Table
               npar  Sum Sq Mean Sq F value
treatment         1  70.738  70.738  15.541
time              2 258.694 129.347  28.417
treatment:time    2 266.361 133.181  29.259

Does mod2 specify that the same people for same treatment should be more similar than others?
What kind of dependence does mod3 suggest? What's the difference from mod2?
Do we even need to specify dependence in sense of (1|treatment:id) if we already account for the treatment as a fixed effect? What do we gain additionally by specifying this as a nested random effect?

score 7 · Accepted Answer · answered Oct 10 '20 at 15:22

Does mod2 specify that the same people for same treatment should be more similar than others?

mod2 implies that you have repeated measures within every combination of treatment and id. From your description, this does not seem to be the case.

What kind of dependence does mod3 suggest? What's the difference from mod2?

mod3 is also fitting random interceps for id, which implies that treatment is nested within id. Again this isn't the case here.

Do we even need to specify dependence in sense of (1|treatment:id) if we already account for the treatment as a fixed effect?

Since you seem to be interested in the fixed effect for treatment, it does not make sense to also include it as a grouping factor for random intercepts as part of an interaction.

What do we gain additionally by specifying this as a nested random effect?

We gain nothing. Since we don't have nested random effects, the standard errors for the fixed effects estimates will be wrong.

1) But is it not the case that there are repeated measures within every combination of treatment and id - I updated the original post and added a better view of the dataframe I am using (see `dat %>% arrange(id)`). So for each combination of treatment and id there are 3 observations (t1, t2 and t3) 2) The logic I was following when modeling mod3 and `(1|id) + (1|treatment:id)` was that we presume that answers from the same person will more similar to each other, but also that answers from the same person within a treatment will be even more similar. Does that make sense? — User33268, Oct 10 '20 at 19:51

Interpreting nested random effects

1 Answers1