Why do I get zero variance of a random effect in my mixed model, despite some variation in the data?

Question

We’ve run a mixed effects logistic regression using the following syntax;

# fit model
fm0 <- glmer(GoalEncoding ~ 1 + Group + (1|Subject) + (1|Item), exp0,
             family = binomial(link="logit"))
# model output
summary(fm0)

Subject and Item are the random effects. We’re getting an odd result which is the coefficient and standard deviation for the subject term are both zero;

Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial  ( logit )
Formula: GoalEncoding ~ 1 + Group + (1 | Subject) + (1 | Item)
Data: exp0

AIC      BIC      logLik deviance df.resid 
449.8    465.3   -220.9    441.8      356 

Scaled residuals: 
Min     1Q Median     3Q    Max 
-2.115 -0.785 -0.376  0.805  2.663 

Random effects:
Groups  Name        Variance Std.Dev.
Subject (Intercept) 0.000    0.000   
Item    (Intercept) 0.801    0.895   
Number of obs: 360, groups:  Subject, 30; Item, 12

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)    
 (Intercept)     -0.0275     0.2843    -0.1     0.92    
 GroupGeMo.EnMo   1.2060     0.2411     5.0  5.7e-07 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Correlation of Fixed Effects:
             (Intr)
 GroupGM.EnM -0.002

This should not be happening because obviously there is variation across subjects. When we run the same analysis in stata

xtmelogit goal group_num || _all:R.subject || _all:R.item

Note: factor variables specified; option laplace assumed

Refining starting values: 

Iteration 0:   log likelihood = -260.60631  
Iteration 1:   log likelihood = -252.13724  
Iteration 2:   log likelihood = -249.87663  

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -249.87663  
Iteration 1:   log likelihood = -246.38421  
Iteration 2:   log likelihood =  -245.2231  
Iteration 3:   log likelihood = -240.28537  
Iteration 4:   log likelihood = -238.67047  
Iteration 5:   log likelihood = -238.65943  
Iteration 6:   log likelihood = -238.65942  

Mixed-effects logistic regression               Number of obs      =       450
Group variable: _all                            Number of groups   =         1

                                                Obs per group: min =       450
                                                               avg =     450.0
                                                               max =       450

Integration points =   1                        Wald chi2(1)       =     22.62
Log likelihood = -238.65942                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
        goal |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   group_num |   1.186594    .249484     4.76   0.000     .6976147    1.675574
       _cons |  -3.419815   .8008212    -4.27   0.000    -4.989396   -1.850234
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
_all: Identity               |
               sd(R.subject) |   7.18e-07   .3783434             0           .
-----------------------------+------------------------------------------------
_all: Identity               |
                 sd(R.trial) |   2.462568   .6226966      1.500201    4.042286
------------------------------------------------------------------------------
LR test vs. logistic regression:     chi2(2) =   126.75   Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.
Note: log-likelihood calculations are based on the Laplacian approximation.

the results are as expected with a non-zero coefficient / s.e. for the Subject term.

Originally we thought this might be something to do with the coding of the Subject term, but changing this from a string to an integer did not make any difference.

Obviously the analysis is not working properly, but we are unable to pin down the source of the difficulties. (NB someone else on this forum has been experiencing a similar issue, but this thread remains unanswered link to question)

You say this shouldn't be happening because "obviously there is variation across subjects" but since we don't know what `subject` is or anything else about these variables, it's not so "obvious" to us"! Also the "non-zero coefficient for the subject term" from your Stata analysis is 7.18e-07! I guess technically, it's "non-zero", but it's not too far from 0 either...! — smillig, Sep 11 '14 at 10:33
Many thanks for observations. Subjects are participants in a study and there is bound to be variation in performance. Mean scores were 39% correct, with a standard deviation of 11%. I would expect this to appear as greater than 0.000 in the reported statistics, but may be wrong. Yes, of course 7.18e-07 is equivalent to 0.000, and 0.000 is not necessarily zero. — Nick Riches, Sep 11 '14 at 12:04
How many times was each subject tested/sampled? Without knowing the substantive aspects of your research, if Stata tells you that the variation within subjects is 0.000000718 (with a standard error of 0.378) and R tells you that it's 0.000, isn't the story here that there really isn't any variation at the subject level? Also note that Stata doesn't give you a confidence interval for the subject variation. — smillig, Sep 11 '14 at 12:30
Thanks again for comments. Subjects were tested on 11 occasions. I guess this means that once group and item effects are accounted for, there is very little variation across participants. It looks a bit "suspect", but I guess there is consistency across the two different analyses? — Nick Riches, Sep 11 '14 at 14:49

score 32 · Accepted Answer · edited May 10 '17 at 08:07

This is discussed at some length at https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html (search for "singular models"); it's common, especially when there is a small number of groups (although 30 is not particularly small in this context).

One difference between lme4 and many other packages is that many packages, including lme4's predecessor nlme, handle the fact that variance estimates must be non-negative by fitting variance on the log scale: that means that variance estimates can't be exactly zero, just very very small. lme4, in contrast, uses constrained optimization, so it can return values that are exactly zero (see http://arxiv.org/abs/1406.5823 p. 24 for more discussion). http://rpubs.com/bbolker/6226 gives an example.

In particular, looking closely at your among-subject variance results from Stata, you have an estimate of 7.18e-07 (relative to an intercept of -3.4) with a Wald standard deviation of .3783434 (essentially useless in this case!) and a 95% CI listed as "0"; this is technically "non-zero", but it's as close to zero as the program will report ...

It's well known and theoretically provable (e.g. Stram and Lee Biometrics 1994) that the null distribution for variance components is a mixture of a point mass ('spike') at zero and a chi-squared distribution away from zero. Unsurprisingly (but I don't know if it's proven/well known), the sampling distribution of the variance component estimates often has a spike at zero even when the true value is not zero -- see e.g. http://rpubs.com/bbolker/4187 for an example, or the last example in the ?bootMer page:

library(lme4)
library(boot)
## Check stored values from a longer (1000-replicate) run:
load(system.file("testdata","boo01L.RData",package="lme4"))
plot(boo01L,index=3)

enter image description here

+1. Another good answer is in the sister thread: https://stats.stackexchange.com/a/34979 (I am leaving this link for future readers). — amoeba, Jun 21 '17 at 07:57
The spike for small but non-zero true values would be expected by contiguity: asymptotically, if an event has non-zero probability for one sequence of distributions it must have non-zero probability for all contiguous sequences, ie, when the true variance component is of order $1/\sqrt{n}$. — Thomas Lumley, Feb 28 '22 at 21:54

score 18 · Answer 2 · answered Sep 11 '14 at 19:28

I don't think there's a problem. The lesson from the model output is that although there is "obviously" variation in subject performance, the extent of this subject variation can be fully or virtually-fully explained by just the residual variance term alone. There is not enough additional subject-level variation to warrant adding an additional subject-level random effect to explain all the observed variation.

Think of it this way. Imagine we are simulating experimental data under this same paradigm. We set up the parameters so that there is residual variation on a trial-by-trial basis, but 0 subject-level variation (i.e., all subjects have the same "true mean," plus error). Now each time we simulate data from this set of parameters, we will of course find that subjects do not have exactly equal performance. Some end up with low scores, some with high scores. But this is all just because of the residual trial-level variation. We "know" (by virtue of having determined the simulation parameters) that there is not really any subject-level variation.

Why do I get zero variance of a random effect in my mixed model, despite some variation in the data?

2 Answers2

Linked

Related