What does it mean that random effects are highly correlated?

Question

What does it mean when two random effects are highly or perfectly correlated?
That is, in R when you call summary on a mixed model object, under "Random effects" "corr" is 1 or -1.

summary(model.lmer) 
Random effects:
Groups   Name                    Variance   Std.Dev.  Corr                 
popu     (Intercept)             2.5714e-01 0.5070912                      
          amdclipped              4.2505e-04 0.0206167  1.000               
          nutrientHigh            7.5078e-02 0.2740042  1.000  1.000        
          amdclipped:nutrientHigh 6.5322e-06 0.0025558 -1.000 -1.000 -1.000

I know this is bad and indicates that the random effects part of the model is too complex, but I'm trying to understand

1)what is doing on statistically
2)what is going on practically with the structure of the response variables.

Example

Here is an example based on "GLMMs in action: gene-by-environment interaction in total fruit production of wild populations of Arabidopsis thaliana" by Bolker et al

Download data

download.file(url = "http://glmm.wdfiles.com/local--files/trondheim/Banta_TotalFruits.csv", destfile = "Banta_TotalFruits.csv")
dat.tf <- read.csv("Banta_TotalFruits.csv", header = TRUE)

Set up factors

dat.tf <- transform(dat.tf,X=factor(X),gen=factor(gen),rack=factor(rack),amd=factor(amd,levels=c("unclipped","clipped")),nutrient=factor(nutrient,label=c("Low","High")))

Modeling log(total.fruits+1) with "population" (popu) as random effect

model.lmer <- lmer(log(total.fruits+1) ~ nutrient*amd + (amd*nutrient|popu), data= dat.tf)

Accessing the Correlation matrix of the random effects show that everything is perfectly correlated

attr(VarCorr(model.lmer)$popu,"correlation")

                         (Intercept) amdclipped nutrientHigh amdclipped:nutrientHigh
(Intercept)                       1          1            1                      -1
amdclipped                        1          1            1                      -1
nutrientHigh                      1          1            1                      -1
amdclipped:nutrientHigh          -1         -1           -1                       1

I understand that these are the correlation coefficients of two vectors of random effects coefficients, such as

cor(ranef(model.lmer)$popu$amdclipped, ranef(model.lmer)$popu$nutrientHigh)

Does a high correlation mean that the two random effects contain redundant information? Is this analogous to multicollinearity in multiple regression when a model with highly correlated predictors should be simplified?

One thing I noticed that the variances in your original example are very small 6.5e-6 = 0.00000065. It depends a bit on how you scaled your variables, but to me this suggest that either you need to rethink the scale of your variables or your variance is _de facto_ zero — Maarten Buis, Oct 11 '13 at 19:19
I don't seem to be able to replicate your results. Which version of `lme4` are you using? If you are not using version 1.0-4 or newer I would recommend upgrading before anything else. Currently I get `a failure to converge in 10000 evaluations` message. — usεr11852, Oct 11 '13 at 20:39
Can you include the entire model summary or provide a link to it? — Livius, Sep 10 '14 at 18:58

score 3 · Answer 1 · edited Apr 13 '17 at 12:44

I am not a 100% sure this answer is correct but given that I just found to have the same issue (perfect correlation) and looking at my own data here is what I assume is happening.

If there is no variation within your grouping (random) variables, the correlation will be either +1 (if both effects have the same sign) or -1.

So e.g. for your example, I would assume that for each value of amd there is only 1 value of nutrient. This lack of variance creates the perfect correlation.

I do not necessarily think this is problematic, it chiefly depends on your model's objective. Check out this answer for an excellent explanation of how different random effects are used with lmer.

The main point seems to be. If you are using a random slope variable - y ~ x3 + (1 + x1 | x2) and you know that for each value of x2 , x1 is meant to be constant, the random slope can still be justified, provided you have good (theoretical / empirical) reason to assume that for each x2 the effect of x1 on your response variable y will be different.

I know it's an old question but hopefully my suggested answer makes some sense.

Simon

What does it mean that random effects are highly correlated?

1 Answers1