Difference between `fix1 + (1|cluster1)` and `fix1 + (1|fix1:cluster1)` when specifying random intercept using lme4 in R

Question

I see sometimes people using two different ways to specify the random effects when conducting multilevel modelling with lme4 in R.

model1 <- lmer(fix1 + (1|cluster1), data = dat)
model2 <- lmer(fix1 + (1|fix1:cluster1), data = dat)

Judged by my limited knowledge of multilevel modelling, people seems to use the these two formulas for very similar research designs (e.g., fix1 is nested within cluster1).

Could you explain if these are doing the same thing? Are there any times that I should use one over the other?

Thank you.

Added: Thank you for the references, @Stefan. I think the following explanation in the website (http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-specification) is indicating that both of the above mentioned commands (i.e., model1 and model2) are technically the same.

site+(1|site:block)

fixed effect of sites plus random variation in intercept among blocks within sites

Based on this, I would like to ask again, if there are any times that I should use one over the other? Or, do the both commands work exactly same all the time? Thank you.

See here: http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-specification and here: https://stats.stackexchange.com/questions/18428/formula-symbols-for-mixed-model-using-lme4 for model specification in `lme4`. — Stefan, Jan 30 '18 at 02:10
Thank you for the references, @Stefan. I had made some mistake in the formulas above. I revised it. — user8460166, Jan 30 '18 at 02:47
This is a question about R syntax, not statistics per se, and so seems to me to be [off-topic](https://stats.stackexchange.com/help/on-topic) for this stack exchange (but potentially on-topic at Stack Overflow). — Jake Westfall, Jan 30 '18 at 04:15
Possible duplicate of [Difference between (factor|group) and (1|factor:group) specifications in lme4](https://stats.stackexchange.com/questions/302951/difference-between-factorgroup-and-1factorgroup-specifications-in-lme4) — amoeba, Jan 30 '18 at 06:20
@Jake That's a reasonable opinion, but I have reversed the migration to SO because the nature of the answer is statistical, even though extensive code is involved in the examples. — whuber, Jan 30 '18 at 15:32

Stefan · Accepted Answer · 2018-01-30T16:52:29.763

No, model1 and model2 are not the same. model1 accounts for random variation within cluster1, where as model2 accounts for random variation among cluster1 within fix1. Here's another link to a different post on this site: Have I correctly specified my model in lmer?

But let's have quick example to illustrate that. Then you'll see that the estimated standard deviations for the random terms will be different. Also have a look at the line where it says Number of obs: and groups:. This will indicate that there are differences in the grouping structure as well.

set.seed(123)
df <- data.frame(FIXED=rep(c("A","B","C"), each=12),
                 RANDOM=rep(c("W","X","Y","Z"), each=3),
                 Y=runif(36,1,10))

head(df)
#  FIXED RANDOM        Y
#1     A      W 3.588198
#2     A      W 8.094746
#3     A      W 4.680792
#4     A      X 8.947157
#5     A      X 9.464206
#6     A      X 1.410008

require(lme4)
summary(lmer(Y ~ FIXED + (1|RANDOM), df))
# ...
#    Random effects:
# Groups   Name        Variance Std.Dev.
# RANDOM   (Intercept) 0.4556   0.675   
# Residual             7.1980   2.683   
#Number of obs: 36, groups:  RANDOM, 4

#Fixed effects:
#            Estimate Std. Error t value
#(Intercept)   6.3945     0.8448   7.569
#FIXEDB       -0.1140     1.0953  -0.104
#FIXEDC       -0.3000     1.0953  -0.274
# ...

summary(lmer(Y ~ FIXED + (1|FIXED:RANDOM), df))
# ...
#    Random effects:
# Groups       Name        Variance  Std.Dev. 
# FIXED:RANDOM (Intercept) 7.887e-14 2.808e-07
# Residual                 7.571e+00 2.752e+00
#Number of obs: 36, groups:  FIXED:RANDOM, 12

#Fixed effects:
#            Estimate Std. Error t value
#(Intercept)   6.3945     0.7943   8.051
#FIXEDB       -0.1140     1.1233  -0.101
#FIXEDC       -0.3000     1.1233  -0.267
# ...

Notice the difference in the standard deviation for the random effects but also the difference in the standard error for the fixed effects!

Now if I added another variable that combines RANDOM and FIXED, such as

df$RANDOM2 <- paste(df$FIXED, df$RANDOM, sep="_")

head(df)
#  FIXED RANDOM        Y RANDOM2
#1     A      W 3.588198     A_W
#2     A      W 8.094746     A_W
#3     A      W 4.680792     A_W
#4     A      X 8.947157     A_X
#5     A      X 9.464206     A_X
#6     A      X 1.410008     A_X

and then run

summary(lmer(Y ~ FIXED + (1|RANDOM2), df))
# ...
#    Random effects:
# Groups       Name        Variance  Std.Dev. 
# FIXED:RANDOM (Intercept) 7.887e-14 2.808e-07
# Residual                 7.571e+00 2.752e+00
#Number of obs: 36, groups:  FIXED:RANDOM, 12

#Fixed effects:
#            Estimate Std. Error t value
#(Intercept)   6.3945     0.7943   8.051
#FIXEDB       -0.1140     1.1233  -0.101
#FIXEDC       -0.3000     1.1233  -0.267
# ...

you can see that the output is the same as the output above using random term +(1|FIXED:RANDOM). Why is that? Because RANDOM is implicitly nested in FIXED whereas RANDOM2 is explicitly nested. So the syntax of your random factors depends on the way you setup your data table. I hope this example made the distinction between +(1|RANDOM) and +(1|FIXED:RANDOM) more clear.

Thank you so much, @Stefan. I understand the difference completely. I really appreciated that you demonstrated the difference using actual codes. If you would not mind, would it be possible for you to provide some examples of research designs when a `random effect` is nested in a `fixed effect`? This may sound like I am asking an obvious thing, but since I just started learning multilevel modelling, I would like to get an idea of how I can relate these difference codes to actual research designs. Thank you. — user8460166, Jan 30 '18 at 19:53
No problem. Glad you understood the difference. The research designs you are looking for are called nested designs. Just grab a text book or ask Google for nested design and mixed effects models. There is a lot of information freely available for you. — Stefan, Jan 30 '18 at 20:25
Thank you very much, @Stefan. I will look for examples by starting googling with 'nested designs' then. Thank you again, and I hope you have a great day! — user8460166, Jan 30 '18 at 20:31

Difference between `fix1 + (1|cluster1)` and `fix1 + (1|fix1:cluster1)` when specifying random intercept using lme4 in R

1 Answers1