What is the appropriate zero-correlation parameter model for factors in lmer?

Question

When one wants to specify a lmer model including variance components but no correlation parameters, as opposed to m1, for a categorical predictor (factor) with levels "A", "B", and "C" one has to convert the factors to numeric covariates or use lme4::dummy(). Until recently I thought m2, where c1 and c2 are the two contrasts defined for the factor, would be the correct way to specify such a zero-correlation parameter model.

But Rune Haubo Bojesen Christensen pointed out that this model does not make sense to him. Instead he suggests m3 (among others) as an appropriate model.

m1 <- lmer(y ~ 1 + factor + (1 + factor | group), data)

mm1 <- model.matrix(~ 1 + factor, data)  # This depends on the contrasts
c1 <- mm1[, 2]
c2 <- mm1[, 3] 
m2 <- lmer(y ~ 1 + factor + (1 + c1 + c2 || group), data)

mm0 <- model.matrix(~ 0 + factor, data)  # This does NOT depend on the contrasts
A <- mm0[, 1]
B <- mm0[, 2]
C <- mm0[, 3]    
m3 <- lmer(y ~ 1 + factor + (1 + A + B + C || group), data)

In what follows I provide an example using the Machines data set. m2a and m2b are two equivalent models representing m2; m3a and m3b represent m3, respectively.

library("lme4")
data("Machines", package = "MEMSS")    
d <- Machines

m1 <- lmer(score ~ 1 + Machine + (1 + Machine | Worker), d)
VarCorr(m1)
 Groups   Name        Std.Dev. Corr         
 Worker   (Intercept) 4.07928               
          MachineB    5.87764   0.484       
          MachineC    3.68985  -0.365  0.297
 Residual             0.96158

mm1 <- model.matrix(~ 1 + Machine, d) 
c1 <- mm1[, 2]
c2 <- mm1[, 3]  
m2a <- lmer(score ~ 1 + Machine + (1 + c1 + c2 || Worker), d)
m2b <- lmer(score ~ 1 + Machine + (1 | Worker) + (0 + c1 | Worker) + (0 + c2 | Worker), d)    
VarCorr(m2a)
 Groups   Name        Std.Dev.
 Worker   (Intercept) 4.07425 
 Worker.1 c1          5.88935 
 Worker.2 c2          3.64708 
 Residual             0.96228 

mm0 <- model.matrix(~ 0 + Machine, d)  
A <- mm0[, 1]
B <- mm0[, 2]
C <- mm0[, 3]
m3a <- lmer(score ~ 1 + Machine + (1 + A + B + C || Worker), d)
m3b <- lmer(score ~ 1 + Machine + (1 | Worker) + (0 + A | Worker) + (0 + B | Worker) + (0 + C | Worker), d)    
VarCorr(m3a)
 Groups   Name        Std.Dev.
 Worker   (Intercept) 3.78595 
 Worker.1 A           1.94032 
 Worker.2 B           5.87402 
 Worker.3 C           2.84547 
 Residual             0.96158

How should one specify a mixed model without correlation parameters for factors and what are the differences between m2 and m3?

Also, is there a preferred model for model comparison with m4 (which is discussed here)?

m4 <- lmer(score ~ 1 + Machine + (1 | Worker) + (1 | Worker:Machine), d)

Update: there is an ongoing discussion regarding this and related questions on the R-sig-mixed-models mailing list.

score 3 · Accepted Answer · edited May 31 '18 at 19:54

SUMMARY: What is the most appropriate zero-correlation model, depends on the data. There is no universally correct choice.

I will consider the same Machines data set. It has several Workers, each repeatedly tested on all of the three Machines. The maximal mixed model is thus

lmer(score ~ 1 + Machine + (0 + Machine | Worker), d)

which fits $3\times 3$ covariance matrix of the random effects. The fixed effects define the mean score for each Machine; there are three Machines so it is a three-dimensional vector $\mu$. On top of that each Worker $i$ deviates from this $\mu$ by some "random" three-dimensional vector $\mu_i$. These $\mu_i$ are random vectors with mean zero $(0,0,0)$ and some $3\times 3$ covariance matrix $\Sigma$. Such a covariance matrix has 6 parameters:

$$\Sigma=\begin{bmatrix}\sigma^2_A&\sigma^2_{AB} &\sigma^2_{AC}\\\sigma^2_{AB}&\sigma^2_B&\sigma^2_{BC}\\\sigma^2_{AC}&\sigma^2_{BC}&\sigma^2_C\end{bmatrix}.$$

Note that

lmer(score ~ 1 + Machine + (1 + Machine | Worker), d)

yields an equivalent model, only parameterized differently. The exact parametrization can also depend on the chosen contrasts, but I find it the easiest to discuss this with dummy contrasts, hence my (0 + Machine | Worker) specification above.

The crucial point here is that every model that simplifies the random effect structure can be understood as imposing some specific constraints on $\Sigma$.

The random intercept (1 | Worker) model corresponds to $$\Sigma=\begin{bmatrix}\sigma^2_w&\sigma^2_w &\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_w\end{bmatrix}.$$ Here each Worker gets a random scalar intercept $m_i$, i.e. $\mu_i = (m_i, m_i, m_i)$; the entries of $\mu_i$ are correlated with correlation 1.

The random interaction (1 | Worker:Machine) model corresponds to $$\Sigma=\begin{bmatrix}\sigma^2_{wm}&0&0\\0&\sigma^2_{wm}&0\\0&0&\sigma^2_{wm}\end{bmatrix}.$$ Here $\mu_i$ has three entries with the same variances but that are assumed to be uncorrelated.

In the following let A, B, and C be dummy variables for three Machines. Then (0 + A | Worker) model corresponds to $$\Sigma=\begin{bmatrix}\sigma^2_A&0&0\\0&0&0\\0&0&0\end{bmatrix}.$$ Here $\mu_i$ has only one non-zero entry with variance $\sigma^2_A$. Similarly for (0 + B | Worker) and (0 + C | Worker).

The second crucial thing to realize is that a sum of uncorrelated multivariate Gaussians with $\Sigma_1$ and $\Sigma_2$ has covariance matrix $\Sigma_1+\Sigma_2$. So to understand what happens with more complicated random structures we can simply add up covariance matrices written above.

For example,

lmer(score ~ 1 + Machine + (1 | Worker) + (1 | Worker:Machine), d)

fits a covariance matrix with 2 parameters (this form of the covariance matrix is known as "compound symmetry"):

$$\Sigma=\begin{bmatrix}\sigma^2_{wm}+\sigma^2_w&\sigma^2_w &\sigma^2_w\\\sigma^2_w&\sigma^2_{wm}+\sigma^2_w&\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_{wm}+\sigma^2_w\end{bmatrix}.$$

The model that Rune Christensen recommends for uncorrelated factors

lmer(score ~ 1 + Machine + (1 + A + B + C || Worker), d)

fits a model with 4 parameters that is a bit more general than compound symmetry (and is only 2 parameters away from the maximal model):

$$\Sigma=\begin{bmatrix}\sigma^2_A+\sigma^2_w&\sigma^2_w &\sigma^2_w\\\sigma^2_w&\sigma^2_B+\sigma^2_w&\sigma^2_w\\\sigma^2_w&\sigma^2_w&\sigma^2_C+\sigma^2_w\end{bmatrix}.$$

The model that you have "until recently" had in mind (your m2) is the model that Reinhold Kliegl recommends as the zero-correlation model:

lmer(score ~ 1 + Machine + (1 + c1 + c2 || Worker), d)

If c1 and c2 were produced using the default treatment contrasts (with A being the reference level), then this model can be written as

lmer(score ~ 1 + Machine + (1 + B + C || Worker), d)

I agree with Rune that it is a somewhat unreasonable model because it treats factor levels differently: B and C get their own variance but A does not (corresponding $\Sigma$ would look the same as the one above but without $\sigma^2_A$). Whereas all three machines should arguably be treated on the same footing.

Thus, the most reasonable sequence of nested models seems to be:

max model --> comp symmetry w/ unequal vars --> comp symmetry --> rand. intercept

A note on marginal distributions

This post was inspired by Rune Christensen's email here https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026847.html. He talks about $9\times 9$ marginal covariance matrices for individual observations within each Worker. I find this more difficult to think about, compared to my presentation above. The covariance matrix from Rune's email can be obtained from any $\Sigma$ as $$\Sigma_\text{marginal} = I_{m\times m} \otimes \Sigma + \sigma^2 I,$$ where $m$ is the number of repetitions per Worker/Machine combination (in this dataset $m=3$) and $\sigma^2$ is residual variance.

A note on sum contrasts

@statmerkur is asking about sum contrasts. Indeed, it is often recommended to use sum contrasts (contr.sum), especially when there are interactions in the model. I feel that this does not affect anything that I wrote above. E.g. the maximal model will still fit an unconstrained $\Sigma$, but the interpretation of its entries is going to be different (variances and covariances of the grand mean and deviations of A and B from the grand mean). The $\Sigma$ in m2 defined using contr.sum will have the same form as in (1+A+B || Worker) above, but again, with the different interpretation of the entries. Two further comments are:

Rune's critique of m2 still applies: this random effect structure does not treat A, B, and C on the same footing;
The recommendation to use the sum contrasts makes sense for the fixed effects (in the presence of interactions). I don't see a reason to necessarily prefer sum contrasts for the random effects, so I think, if one wants to, one can safely use (1+A+B+C || Worker) even if the fixed part uses sum contrasts.

A note on custom contrasts

I had an email exchange with Reinhold Kliegl about this answer. Reinhold says that in his applied work he prefers (1+c1+c2 || subject) over (1+A+B+C || subject) because he chose c1 and c2 as some meaningful contrasts. He wants to be able to interpret $\Sigma$ and he wants its entries to correspond to c1 and c2.

This basically means that Reinhold is fine with rejecting the assumption (that I made above) that the factor levels should be treated equally. He does not care about individual factor levels at all! If so, then of course it is fine to use (1+c1+c2 || subject). He gives his paper https://www.frontiersin.org/articles/10.3389/fpsyg.2010.00238/full as an example. There a four-level factor is coded with 3 custom contrasts c1, c2, c3, and grand mean as the intercept. These specific contrasts are of interest, and not the individual factors A to D. In this situation I agree that (1+c1+c2+c3 || subject) makes total sense.

But one should be clear that while (1+c1+c2+c3 | subject) does treat factor levels A to D equally (and merely re-parametrizes $\Sigma$ in terms of particular contrasts), (1+c1+c2+c3 || subject) will fail to treat factor levels equally.

Nice answer. But, as I said earlier, the random effects part in `m2` is only equivalent to `(1 + B + C || Worker)`for `contr.treatment`. One has to use `(c1 + c2 || group)` for the general form. Can you give the covariance matrix for `m2` with `contr.treatment` and with `contr.sum`? — statmerkur, May 28 '18 at 08:14
Would be also very nice if there would be an example of the covariance matrix of `m1` for `(Machine | Worker)` with `contr.treatment` and `contr.sum`. — statmerkur, May 28 '18 at 08:19
Hmm. If you use `contr.sum`, then your random effects do not correspond to A, B, and C anymore. You still have 3x3 covariance matrices, but the rows/columns correspond to the grand mean M, and the deviation of A from M and the deviation of B from M (I *think* that's how `contr.sum` works; right?). I am not sure what you want to see for `m1`, it's the still a generic 3x3 cov matrix. The same is true for `m2`: it would have the same form as with treatment contrasts, just the interpretation of the entries is different. I think the critique remains that `m2` treats A, B, and C differently. — amoeba, May 28 '18 at 08:35
I made an update about `contr.sum` to be more explicit about it. — amoeba, May 28 '18 at 10:01
Sorry, I did not realize that this https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026857.html was *your* email. The answer to your Q1 from that email is related to the 2nd bullet point in my Update: it does not matter what contrasts are used for the fixed part; the estimated coefficients will one way or another define means for A, B, and C. So a random part `(1+A+B+C|Worker)` can still be thought as random deviations from these means. It might be a bit confusing when the fixed part and the random part use different contrasts (so to say), but it's fine. — amoeba, May 28 '18 at 11:00
How would the cov matrix of `m2` for `contr.treatment` look like? Like the one for `m3` except for entry (1,1) which would be just sigma_omega^2 ? — statmerkur, May 28 '18 at 11:46
Re first comment: Yes! I added this into the answer. Re second comment: well, the fixed part in this example just defines means $\vec\mu$ (I'm writing it as a vector to stress that it's averages for 3 machines). Whatever contrast scheme you use, it still yields the same $\vec\mu$, just parametrized differently. So random effects can be zero-centered Gaussian random variables around entries of $\vec\mu$. Nothing changes! Nothing breaks if you parameterize $\vec\mu$ in one way for the fixed part and in another way for the random part. — amoeba, May 28 '18 at 12:49
@amoeba nice description! A small comment on terminology: It might be a little confusing for the uninitiated reader when you denote the covariance structure for `(1 | Worker) + (1 | Worker:Machine)` _compound symmetry_. I see why you do that, but a model with `(1 | Worker)` is usually said to be "compound symmetry" because the Var-Cov matrix of $Y$ (marginal distribution) has a compound symmetry form (and can be fitted as such with GLS). But it is of course not easy to see how the residual variance enters the mix unless we look at $\mathrm{Cov}(Y)$ ;-) — Rune H Christensen, May 28 '18 at 18:40
I also note that the description of $\Sigma$ for `(1 | Worker)` is somewhat non-standard. I think most people would think of a scalar random effect with (scalar) variance $\sigma_w^2$ for this model. But @amoeba's representation is mathematically equivalent and offers a fresh perspective - thanks. — Rune H Christensen, May 28 '18 at 18:41
@RuneHChristensen Thanks. Indeed, "compound symmetry" usually refers to the covariance matrix of the marginal distribution, and not to the covariance matrix of the random effects. I borrowed this term for this context from a quote from Douglas Bates, see the first paragraph of this question: https://stats.stackexchange.com/questions/304374/. But I should edit this answer to clarify a little. Regarding the scalar random effect, you are completely right: my description is non-standard but I was happy when I came up with it because it made the whole treatment clearer. — amoeba, May 28 '18 at 20:21
@statmerkur I have just written another Update based on an email exchange with Kliegl. Please take a look. — amoeba, May 29 '18 at 07:48
@statmerkur And by the way, thanks for posting this question (and for starting the bounty to attract attention). I learned quite a bit while answering it. — amoeba, May 29 '18 at 09:08
To be accurate the model I said that did not make sense to me (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026775.html) was a version of `m2a` (for a different data situation) in which `c1` and `c2` was coded with $\pm$ 0.5 which makes the random-effect design matrix $Z$ no longer be an indicator matrix. This is what I cannot but think will seriously alter the variance-covariance structure of the model, and this is why I can't make sense of it. Models in which `c1` and `c2`are indicator contrasts are not as perplexing to me though I don't claim to fully understand those either. — Rune H Christensen, May 29 '18 at 13:58
@RuneHChristensen: sorry, gave the wrong link, corrected it. This is the one I'm referring to (IIRC we were still using `contr.treatment` there) https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026769.html. And if `c1` and `c2` are custom contrasts this will alter the variance-covariance structure of the model, too (I think). So what do you think about @amoeba's "note on sum contrasts" and "note on custom contrasts" — statmerkur, May 30 '18 at 22:10
@amoeba Coming back to this after a while, I just looked through Reinhold Kliegl's derivation (https://rpubs.com/Reinhold/423796) of $\Sigma$ for `m2` with `contr.sum` and wasn't able to figure out why you say that it looks like the one for `(1+A+B || Worker)`. Would be great if you could elaborate on this. See next comment for the $\Sigma$ he derived — statmerkur, Oct 14 '19 at 15:48
$\Sigma = \begin{bmatrix} \sigma^2_1 & \sigma^2_1 & \sigma^2_1\\ \sigma^2_1 & \sigma^2_1 & \sigma^2_1\\ \sigma^2_1 & \sigma^2_1 & \sigma^2_1 \end{bmatrix} + \begin{bmatrix} \sigma^2_2 & 0 & -\sigma^2_2\\ 0 & \sigma^2_3 & -\sigma^2_3\\ -\sigma^2_2 & -\sigma^2_3 & \sigma^2_2 + \sigma^2_3 \end{bmatrix}$ — statmerkur, Oct 14 '19 at 15:49
@statmerkur Sorry, it's been a while, and I don't really recall everything here very well anymore. I'd need to spend quite some time remembering what was going on here before I could answer that. Not sure I'll be able to do it any time soon... Might very well be that my statement was wrong. — amoeba, Oct 14 '19 at 21:09
Sure, that's totally understandable. Would be great if you found the time to go over it again. Or, coincidentally, maybe @RuneHChristensen can provide a quick answer to that question? — statmerkur, Oct 14 '19 at 22:34

What is the appropriate zero-correlation parameter model for factors in lmer?

1 Answers1

A note on marginal distributions

A note on sum contrasts

A note on custom contrasts

Linked