SUMMARY: What is the most appropriate zero-correlation model, depends on the data. There is no universally correct choice.
I will consider the same Machines
data set. It has several Workers, each repeatedly tested on all of the three Machines. The maximal mixed model is thus
lmer(score ~ 1 + Machine + (0 + Machine | Worker), d)
which fits $3\times 3$ covariance matrix of the random effects. The fixed effects define the mean score for each Machine; there are three Machines so it is a three-dimensional vector $\mu$. On top of that each Worker $i$ deviates from this $\mu$ by some "random" three-dimensional vector $\mu_i$. These $\mu_i$ are random vectors with mean zero $(0,0,0)$ and some $3\times 3$ covariance matrix $\Sigma$. Such a covariance matrix has 6 parameters:
$$\Sigma=\begin{bmatrix}\sigma^2_A&\sigma^2_{AB} &\sigma^2_{AC}\\\sigma^2_{AB}&\sigma^2_B&\sigma^2_{BC}\\\sigma^2_{AC}&\sigma^2_{BC}&\sigma^2_C\end{bmatrix}.$$
Note that
lmer(score ~ 1 + Machine + (1 + Machine | Worker), d)
yields an equivalent model, only parameterized differently. The exact parametrization can also depend on the chosen contrasts, but I find it the easiest to discuss this with dummy contrasts, hence my (0 + Machine | Worker)
specification above.
The crucial point here is that every model that simplifies the random effect structure can be understood as imposing some specific constraints on $\Sigma$.
The random intercept (1 | Worker)
model corresponds to $$\Sigma=\begin{bmatrix}\sigma^2_w&\sigma^2_w &\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_w\end{bmatrix}.$$ Here each Worker gets a random scalar intercept $m_i$, i.e. $\mu_i = (m_i, m_i, m_i)$; the entries of $\mu_i$ are correlated with correlation 1.
The random interaction (1 | Worker:Machine)
model corresponds to $$\Sigma=\begin{bmatrix}\sigma^2_{wm}&0&0\\0&\sigma^2_{wm}&0\\0&0&\sigma^2_{wm}\end{bmatrix}.$$ Here $\mu_i$ has three entries with the same variances but that are assumed to be uncorrelated.
In the following let A
, B
, and C
be dummy variables for three Machines. Then (0 + A | Worker)
model corresponds to $$\Sigma=\begin{bmatrix}\sigma^2_A&0&0\\0&0&0\\0&0&0\end{bmatrix}.$$ Here $\mu_i$ has only one non-zero entry with variance $\sigma^2_A$. Similarly for (0 + B | Worker)
and (0 + C | Worker)
.
The second crucial thing to realize is that a sum of uncorrelated multivariate Gaussians with $\Sigma_1$ and $\Sigma_2$ has covariance matrix $\Sigma_1+\Sigma_2$. So to understand what happens with more complicated random structures we can simply add up covariance matrices written above.
For example,
lmer(score ~ 1 + Machine + (1 | Worker) + (1 | Worker:Machine), d)
fits a covariance matrix with 2 parameters (this form of the covariance matrix is known as "compound symmetry"):
$$\Sigma=\begin{bmatrix}\sigma^2_{wm}+\sigma^2_w&\sigma^2_w &\sigma^2_w\\\sigma^2_w&\sigma^2_{wm}+\sigma^2_w&\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_{wm}+\sigma^2_w\end{bmatrix}.$$
The model that Rune Christensen recommends for uncorrelated factors
lmer(score ~ 1 + Machine + (1 + A + B + C || Worker), d)
fits a model with 4 parameters that is a bit more general than compound symmetry (and is only 2 parameters away from the maximal model):
$$\Sigma=\begin{bmatrix}\sigma^2_A+\sigma^2_w&\sigma^2_w &\sigma^2_w\\\sigma^2_w&\sigma^2_B+\sigma^2_w&\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_C+\sigma^2_w\end{bmatrix}.$$
The model that you have "until recently" had in mind (your m2
) is the model that Reinhold Kliegl recommends as the zero-correlation model:
lmer(score ~ 1 + Machine + (1 + c1 + c2 || Worker), d)
If c1
and c2
were produced using the default treatment contrasts (with A being the reference level), then this model can be written as
lmer(score ~ 1 + Machine + (1 + B + C || Worker), d)
I agree with Rune that it is a somewhat unreasonable model because it treats factor levels differently: B and C get their own variance but A does not (corresponding $\Sigma$ would look the same as the one above but without $\sigma^2_A$). Whereas all three machines should arguably be treated on the same footing.
Thus, the most reasonable sequence of nested models seems to be:
max model --> comp symmetry w/ unequal vars --> comp symmetry --> rand. intercept
A note on marginal distributions
This post was inspired by Rune Christensen's email here https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026847.html. He talks about $9\times 9$ marginal covariance matrices for individual observations within each Worker. I find this more difficult to think about, compared to my presentation above. The covariance matrix from Rune's email can be obtained from any $\Sigma$ as $$\Sigma_\text{marginal} = I_{m\times m} \otimes \Sigma + \sigma^2 I,$$ where $m$ is the number of repetitions per Worker/Machine combination (in this dataset $m=3$) and $\sigma^2$ is residual variance.
A note on sum contrasts
@statmerkur is asking about sum contrasts. Indeed, it is often recommended to use sum contrasts (contr.sum
), especially when there are interactions in the model. I feel that this does not affect anything that I wrote above. E.g. the maximal model will still fit an unconstrained $\Sigma$, but the interpretation of its entries is going to be different (variances and covariances of the grand mean and deviations of A and B from the grand mean). The $\Sigma$ in m2
defined using contr.sum
will have the same form as in (1+A+B || Worker)
above, but again, with the different interpretation of the entries. Two further comments are:
Rune's critique of m2
still applies: this random effect structure does not treat A, B, and C on the same footing;
The recommendation to use the sum contrasts makes sense for the fixed effects (in the presence of interactions). I don't see a reason to necessarily prefer sum contrasts for the random effects, so I think, if one wants to, one can safely use (1+A+B+C || Worker)
even if the fixed part uses sum contrasts.
A note on custom contrasts
I had an email exchange with Reinhold Kliegl about this answer. Reinhold says that in his applied work he prefers (1+c1+c2 || subject)
over (1+A+B+C || subject)
because he chose c1
and c2
as some meaningful contrasts. He wants to be able to interpret $\Sigma$ and he wants its entries to correspond to c1
and c2
.
This basically means that Reinhold is fine with rejecting the assumption (that I made above) that the factor levels should be treated equally. He does not care about individual factor levels at all! If so, then of course it is fine to use (1+c1+c2 || subject)
. He gives his paper https://www.frontiersin.org/articles/10.3389/fpsyg.2010.00238/full as an example. There a four-level factor is coded with 3 custom contrasts c1
, c2
, c3
, and grand mean as the intercept. These specific contrasts are of interest, and not the individual factors A
to D
. In this situation I agree that (1+c1+c2+c3 || subject)
makes total sense.
But one should be clear that while (1+c1+c2+c3 | subject)
does treat factor levels A
to D
equally (and merely re-parametrizes $\Sigma$ in terms of particular contrasts), (1+c1+c2+c3 || subject)
will fail to treat factor levels equally.