Singular random-effect covariance matrices
Obtaining a random effect correlation estimate of +1 or -1 means that the optimization algorithm hit "a boundary": correlations cannot be higher than +1 or lower than -1. Even if there are no explicit convergence errors or warnings, this potentially indicates some problems with convergence because we do not expect true correlations to lie on the boundary. As you said, this usually means that there are not enough data to estimate all the parameters reliably. Matuschek et al. 2017 say that in this situation the power can be compromised.
Another way to hit a boundary is to get a variance estimate of 0: Why do I get zero variance of a random effect in my mixed model, despite some variation in the data?
Both situations can be seen as obtaining a degenerate covariance matrix of random effects (in your example output covariance matrix is $4\times 4$); a zero variance or a perfect correlation means that the covariance matrix is not full rank and [at least] one of its eigenvalues is zero. This observation immediately suggests that there are other, more complex ways to get a degenerate covariance matrix: one can have a $4\times 4$ covariance matrix without any zeros or perfect correlations but nevertheless rank-deficient (singular). Bates et al. 2015 Parsimonious Mixed Models (unpublished preprint) recommend using principal component analysis (PCA) to check if the obtained covariance matrix is singular. If it is, they suggest to treat this situation the same way as the above singular situations.
So what to do?
If there is not enough data to estimate all the parameters of a model reliably, then we should consider simplifying the model. Taking your example model, X*Cond + (X*Cond|subj)
, there are various possible ways to simplify it:
Remove one of the random effects, usually the highest-order correlation:
X*Cond + (X+Cond|subj)
Get rid of all the correlation parameters:
X*Cond + (X*Cond||subj)
Update: as @Henrik notes, the ||
syntax will only remove correlations if all variables to the left of it are numerical. If categorical variables (such as Cond
) are involved, one should rather use his convenient afex
package (or cumbersome manual workarounds). See his answer for more details.
Get rid of some of the correlations parameters by breaking the term into several, e.g.:
X*Cond + (X+Cond|subj) + (0+X:Cond|subj)
- Constrain the covariance matrix in some specific way, e.g. by setting one specific correlation (the one that hit the boundary) to zero, as you suggest. There is no built-in way in
lme4
to achieve this. See @BenBolker's answer on SO for a demonstration of how to achieve this via some smart hacking.
Contrary to what you said, I don't think Matuschek et al. 2017 specifically recommend #4. The gist of Matuschek et al. 2017 and Bates et al. 2015 seems to be that one starts with the maximal model a la Barr et al. 2013 and then decreases the complexity until the covariance matrix is full rank. (Moreover, they would often recommend to reduce the complexity even further, in order to increase the power.) Update: In contrast, Barr et al. recommend to reduce complexity ONLY if the model did not converge; they are willing to tolerate singular covariance matrices. See @Henrik's answer.
If one agrees with Bates/Matuschek, then I think it is fine to try out different ways of decreasing the complexity in order to find the one that does the job while doing "the least damage". Looking at my list above, the original covariance matrix has 10 parameters; #1 has 6 parameters, #2 has 4 parameters, #3 has 7 parameters. Which model will get rid of the perfect correlations is impossible to say without fitting them.
But what if you are interested in this parameter?
The above discussion treats random effect covariance matrix as a nuisance parameter. You raise an interesting question of what to do if you are specifically interested in a correlation parameter that you have to "give up" in order to get a meaningful full-rank solution.
Note that fixing correlation parameter at zero will not necessarily yield BLUPs (ranef
) that are uncorrelated; in fact, they might not even be affected that much at all (see @Placidia's answer for a demonstration). So one option would be to look at the correlations of BLUPs and report that.
Another, perhaps less attractive, option would be to use treat subject
as a fixed effect Y~X*cond*subj
, get the estimates for each subject and compute correlation between them. This is equivalent to running separate Y~X*cond
regressions for each subject separately and get the correlation estimates from them.
See also the section on singular models in Ben Bolker's mixed model FAQ:
It is very common for overfitted mixed models to result in singular fits. Technically, singularity means that some of the $\theta$ (variance-covariance Cholesky decomposition) parameters corresponding to diagonal elements of the Cholesky factor are exactly zero, which is the edge of the feasible space, or equivalently that the variance-covariance matrix has some zero eigenvalues (i.e. is positive semidefinite rather than positive definite), or (almost equivalently) that some of the variances are estimated as zero or some of the correlations are estimated as +/-1.