I was playing around with some data and had hard time to understand the meaning of nested effects.
Here's an example of a dataset (selfesteem2 from package datarium
)
dat <- structure(list(id = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L), .Label = c("1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12"), class = "factor"), treatment = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ctr", "Diet"), class = "factor"),
time = c("t1", "t1", "t1", "t1", "t1", "t1", "t1", "t1",
"t1", "t1", "t1", "t1", "t1", "t1", "t1", "t1", "t1", "t1",
"t1", "t1", "t1", "t1", "t1", "t1", "t2", "t2", "t2", "t2",
"t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2",
"t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2", "t2",
"t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3",
"t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3", "t3",
"t3", "t3", "t3", "t3"), val = c(83, 97, 93, 92, 77, 72,
92, 92, 95, 92, 92, 79, 84, 100, 91, 91, 74, 76, 90, 89,
93, 90, 93, 80, 77, 95, 92, 92, 73, 65, 89, 87, 91, 84, 92,
69, 86, 99, 91, 92, 76, 75, 87, 89, 94, 93, 92, 80, 69, 88,
89, 89, 68, 63, 79, 81, 84, 81, 91, 62, 88, 97, 92, 95, 72,
76, 87, 88, 93, 95, 91, 78)), row.names = c(NA, -72L), class = c("tbl_df",
"tbl", "data.frame"))
dat %>% arrange(id)
# A tibble: 72 x 4
id treatment time val
<fct> <fct> <chr> <dbl>
1 1 ctr t1 83
2 1 Diet t1 84
3 1 ctr t2 77
4 1 Diet t2 86
5 1 ctr t3 69
6 1 Diet t3 88
7 2 ctr t1 97
8 2 Diet t1 100
9 2 ctr t2 95
10 2 Diet t2 99
# ... with 62 more rows
dat$id %>% unique
[1] 1 2 3 4 5 6 7 8 9 10 11 12
Levels: 1 2 3 4 5 6 7 8 9 10 11 12
> dat$treatment %>% unique
[1] ctr Diet
Levels: ctr Diet
> dat$time %>% unique
[1] "t1" "t2" "t3"
This is a repeated measures design, meaning that every participant (id) has gone through treatment-ctr and treatment-Diet, in all three time points (t1, t2, t3).
If I were to analyze this within the mixed models framework, I would do:
library(lme4)
mod1 <- lmer(val ~ treatment*time + (1|id), data = dat) %>% anova
mod1
Analysis of Variance Table
npar Sum Sq Mean Sq F value
treatment 1 316.68 316.68 41.037
time 2 258.69 129.35 16.762
treatment:time 2 266.36 133.18 17.258
If I am right, this model analyses the main effects and interaction of treatment and time, while controlling for the fact that data points are not independent (same participants should have more similar results in various design groups than different participants).
Let's say we specify two further models:
> mod2 <- lmer(val ~ treatment*time + (1|treatment:id), data = dat) %>% anova
mod2
Analysis of Variance Table
npar Sum Sq Mean Sq F value
treatment 1 6.518 6.518 1.432
time 2 258.694 129.347 28.417
treatment:time 2 266.361 133.181 29.259
> mod3 <- lmer(val ~ treatment*time + (1|id) + (1|treatment:id), data = dat) %>% anova
mod3
Analysis of Variance Table
npar Sum Sq Mean Sq F value
treatment 1 70.738 70.738 15.541
time 2 258.694 129.347 28.417
treatment:time 2 266.361 133.181 29.259
- Does mod2 specify that the same people for same treatment should be more similar than others?
- What kind of dependence does mod3 suggest? What's the difference from mod2?
- Do we even need to specify dependence in sense of
(1|treatment:id)
if we already account for the treatment as a fixed effect? What do we gain additionally by specifying this as a nested random effect?