16

Suppose I have $N$ participants, each of whom gives a response $Y$ 20 times, 10 in one condition and 10 in another. I fit a linear mixed effects model comparing $Y$ in each condition. Here's a reproducible example simulating this situation using the lme4 package in R:

library(lme4)
fml <- "~ condition + (condition | participant_id)"
d <- expand.grid(participant_id=1:40, trial_num=1:10)
d <- rbind(cbind(d, condition="control"), cbind(d, condition="experimental"))

set.seed(23432)
d <- cbind(d, simulate(formula(fml), 
                       newparams=list(beta=c(0, .5), 
                                      theta=c(.5, 0, 0), 
                                      sigma=1), 
                       family=gaussian, 
                       newdata=d))

m <- lmer(paste("sim_1 ", fml), data=d)
summary(m)

The model m yields two fixed effects (an intercept and slope for condition), and three random effects (a by-participant random intercept, a by-participant random slope for condition, and an intercept-slope correlation).

I would like to statistically compare the size of the by-participant random intercept variance across the groups defined by condition (i.e., compute the variance component highlighted in red separately within the control and experimental conditions, then test whether the difference in the size of the components is non-zero). How would I do this (preferably in R)?

enter image description here


BONUS

Let's say the model is slightly more complicated: The participants each experience 10 stimuli 20 times each, 10 in one condition and 10 in another. Thus, there are two sets of crossed random effects: random effects for participant and random effects for stimulus. Here's a reproducible example:

library(lme4)
fml <- "~ condition + (condition | participant_id) + (condition | stimulus_id)"
d <- expand.grid(participant_id=1:40, stimulus_id=1:10, trial_num=1:10)
d <- rbind(cbind(d, condition="control"), cbind(d, condition="experimental"))

set.seed(23432)
d <- cbind(d, simulate(formula(fml), 
                       newparams=list(beta=c(0, .5), 
                                      theta=c(.5, 0, 0, .5, 0, 0), 
                                      sigma=1), 
                       family=gaussian, 
                       newdata=d))

m <- lmer(paste("sim_1 ", fml), data=d)
summary(m)

I would like to statistically compare the magnitude of the random by-participant intercept variance across the groups defined by condition. How would I do that, and is the process any different from the one in the situation described above?


EDIT

To be a bit more specific about what I'm looking for, I want to know:

  1. Is the question, "are the conditional mean responses within each condition (i.e., random intercept values in each condition) substantially different from each other, beyond what we would expect due to sampling error" a well-defined question (i.e., is this question even theoretically answerable)? If not, why not?
  2. If the answer to question (1) is yes, how would I answer it? I would prefer an R implementation, but I'm not tied to the lme4 package -- for example, it seems as though the OpenMx package has the capability to accommodate multi-group and multi-level analyses (https://openmx.ssri.psu.edu/openmx-features), and this seems like the sort of question that ought to be answerable in an SEM framework.
Patrick S. Forscher
  • 3,122
  • 23
  • 43
  • "I would like to statistically compare the size of the by-participant random slope in the groups defined by condition." What do you mean by the size of the by-participant random slope, and by condition? The variance around the slope (0.07726) is the variance of the effect of condition, so I'm not sure how you could compare it across conditions? – Mark White Aug 11 '18 at 02:25
  • This gets us the slope for participant 1 and participant 2. Are you looking to compare these two slopes, for example? `ranef(m)$participant_id[1, 2]; ranef(m)$participant_id[2, 2]` – Mark White Aug 11 '18 at 02:27
  • 1
    @MarkWhite, I've updated the question in response to your comments. I mean that I want to compare the standard deviation of the participant intercepts when they give responses in the control condition vs when they give responses in the experimental condition. I want to do this statistically, i.e., test if the difference in the standard deviation of the intercepts is different from 0. – Patrick S. Forscher Aug 11 '18 at 03:37
  • 1
    Thanks. Interesting and unique question. Do you need to be modeling condition at the same time? It seems like you could have two models: One with just condition = control, another where just condition = experimental. You do intercept-only: `y ~ 1 + (1 | id)`. Then you could subtract the two variances of the intercepts and either bootstrap or specify these models in Stan and sample from the posterior. This would just be the variance of the response in each condition, so why not just compare variances across conditions? – Mark White Aug 11 '18 at 03:49
  • Hmm, I think something like that could work. If you have the time to add a full answer using the data from my reproducible example (maybe using the `boot` package?) I'll likely accept it! – Patrick S. Forscher Aug 11 '18 at 04:01
  • 2
    I wrote up an answer, but will sleep on it because I am not sure it is very useful. The question comes down to that I don't think one can do what you are asking. The random effect of the intercept is the variance in the participants' means when they are in the control condition. So one cannot look at the variance of those for observations in the experimental condition. The intercepts are defined at the person-level, and the condition is at the observation level. If you are trying to compare variances between conditions, I would think about conditionally heteroscedastic models. – Mark White Aug 11 '18 at 04:31
  • 1
    It might help if you explain what type of hypothesis you are trying to solve in your actual data—perhaps it can be answered in a simpler way than comparing variances of the intercepts. – Mark White Aug 11 '18 at 04:40
  • 2
    I'm working on a revise & resubmit for a paper where I have participants who give responses to sets of stimuli. Each participant is exposed to multiple conditions and each stimulus receives a response in multiple conditions -- in other words, my study emulates the setup I describe in my "BONUS" description. In one of my graphs, it appears that the average participant response has greater variability in one of the conditions than the others. A reviewer asked me to test whether this is true. – Patrick S. Forscher Aug 11 '18 at 04:51
  • 1
    If it is indeed the case that the reviewer's question is ill-posed, that's fine. I would just need a good explanation for why this is true. – Patrick S. Forscher Aug 11 '18 at 04:53
  • 2
    Please see here https://stats.stackexchange.com/questions/322213/ for how to set up a lme4 model with different variance parameters for each level of a grouping variable. I'm not sure how to do a hypothesis test on whether two variance parameters are equal; personally, I would always prefer to bootstrap over subjects and stimuli to get a confidence interval, or maybe to setup some kind of a permutation-like (resampling based) hypothesis test. – amoeba Aug 11 '18 at 08:43
  • 1
    I find sentences like "are the random intercepts substantially different from each other" unclear, and keep replacing them with smth like "are the random intercept variances...". – amoeba Aug 12 '18 at 19:36
  • 1
    @amoeba, thanks! The terminology for these statistical quantities is indeed confusing – Patrick S. Forscher Aug 12 '18 at 19:40
  • 1
    So in your simpler case with only random participants, a lme4 model `sim_1 ~ condition + (0 + dummy(condition, "control") | participant_id) + (0 + dummy(condition, "experimental") | participant_id)` will estimate two separate variances. They come out as 0.49 and 0.45. From here, I would either bootstrap subjects to get a confidence interval for the difference (0.04) or permute subjects to get the p-value. However, this model lacks a correlation parameter. I am not quite sure how to add it. – amoeba Aug 12 '18 at 19:52
  • I wonder how much the correlation parameter matters, even in a case where the "true" value is non-zero. In my experience fixing it to 0 seldom changes the other model estimates much – Patrick S. Forscher Aug 12 '18 at 19:55
  • I agree. This also generalizes easily to the Bonus situation with two random terms. – amoeba Aug 12 '18 at 20:15
  • (BTW, when running your Bonus example, I noticed that I get correlations -1 and 0.95 even though the simulation code seems to be setting them at zero. Not sure why this happens.) – amoeba Aug 12 '18 at 20:17
  • In my experience `lme4` often reports the correlations as -1 or 1 (or values close to that) when their estimates are very close to 0. I'm not sure why that is exactly. The upshot is that I usually don't take the specific values of the correlations seriously when using `lme4` ... which I guess brings us back to the point that it might not be a big deal if the solution assumes that these values are 0 :) – Patrick S. Forscher Aug 12 '18 at 20:42
  • This is weird. What exactly it `theta` in `newparams` in `simulate`? Are you sure that zero there means zero correlation? I have no idea what parameterization `theta` is using. It might be Cholesky factorization of the random covariance matrix, in which case I'm not sure what 0 means. – amoeba Aug 12 '18 at 20:53
  • Well I had thought that `beta` were the fixed effects, `theta` the random effects, and `sigma` the residual error, but I admit the help file for `simulate.merMod` is a little cryptic https://www.rdocumentation.org/packages/lme4/versions/1.1-17/topics/simulate.merMod – Patrick S. Forscher Aug 12 '18 at 21:03
  • I think you're right that it's the Cholesky factorization: http://rstudio-pubs-static.s3.amazonaws.com/133409_8567e08df08d42a8977853e4f94ca005.html Weird that this isn't in the lme4 documentation – Patrick S. Forscher Aug 12 '18 at 21:11
  • Right. But Cholesky factor [.5, 0; 0 0] should yield covariance matrix with zero correlation... I thought maybe it's because one of covarinces is set to zero, and tried simulation with `theta=(.5, .1, 0)` and 1000 participants. I still get estimated correlation of 1 (but correct stand. deviations .5 and .1). This is very strange, I think we should ask a separate question about it. – amoeba Aug 12 '18 at 21:59
  • I'd be very curious about the answer as well. I think Ben Bolker is active either here or on StackOverflow – Patrick S. Forscher Aug 12 '18 at 22:01
  • 1
    The theta parameter is explained here https://stats.stackexchange.com/questions/155424/how-does-lme4-optimize-the-variance-parameters-of-random-effects-theta-vector If you plot the profile likelihood `xyplot(profile(lmer(paste("sim_1 ", fml), data=d)))` then you see how the estimate for the correlation related $\theta_2$ term has a very flat profile, and basically the correlation parameter is all over the place. The $\theta_2$ parameter has also a lot of interaction with $\theta_1$ and $\theta_3$ (2d plot `splom(profile(lmer(paste("sim_1 ", fml), data=d)))` ). – Sextus Empiricus Aug 13 '18 at 09:53
  • Note you can also use the profile likelihood to 'solve' your question/problem (instead of doing bootstrapping) https://stats.stackexchange.com/questions/77528/what-is-the-relationship-between-profile-likelihood-and-confidence-intervals – Sextus Empiricus Aug 13 '18 at 09:57
  • https://github.com/lme4/lme4/issues/480 – amoeba Aug 13 '18 at 11:13
  • @MartijnWeterings Thanks for the links. I read the first link and got even more confused. See the github issue linked above. – amoeba Aug 13 '18 at 12:02
  • 1
    @amoeba The correlation parameter is; $$\rho = \frac{\theta_2}{\sqrt{\theta_2^2+\theta_3^2}}$$ So the issue occurs when the parameter $\theta_3$ is estimated at zero. I can imagine that the optimizer has problems with the likelihood near the boundary as well as the correlation between the different theta's. Anyway, apparently when you set `control=lmerControl(boundary.tol = 1e-1, restart_edge = TRUE)` then the correlation will be estimated much more reasonably. – Sextus Empiricus Aug 13 '18 at 12:32
  • 3
    I agree with @MarkWhite 's comment that the question "are the random intercept variances substantially different from each other..." is at best unclear and at worst nonsensical, because the intercept necessarily refers to Y-values in one specific group (the group assigned the value of 0), so comparing "intercepts" across groups strictly speaking doesn't make sense. I think a better way to rephrase your question, as I understand it, would be something like: "are the variances of participants' conditional mean responses in condition A vs. condition B unequal?" – Jake Westfall Aug 14 '18 at 03:06
  • 1
    @JakeWestfall, fair enough! I've made some updates to the question. – Patrick S. Forscher Aug 14 '18 at 03:23
  • 1
    MartijnWeterings, Henrik, amoeba I ended up accepting JakeWestfall's answer because I think it is the best answer to my original question. But I have to say that I found all your answers interesting and insightful. I feel like I learned a lot from your answers and I want to express my deep appreciation for them! – Patrick S. Forscher Aug 19 '18 at 06:24
  • I’m not certain that your question can be answered by looking at the estimated covariance matrix of the random effects. I.e., random effects models have direct connections with a Bayesian way of thinking. The covariance matrix you get in the output is the covariance matrix of the prior distribution of the random effects, i.e., how variable are the subjects apriori, without seeing there data. Logically, you would want to say some nothing about the posterior of the random effects given the observed outcome data. Here this will still be a normal dist but with a more complicated cov matrix – Dimitris Rizopoulos Aug 30 '18 at 05:03
  • In my answer I show a different way to parameterize a model (see `mod3` and `mod4`) that you can use to test your hypothesis of equal variances. Maybe that's useful, too. – statmerkur Dec 03 '19 at 11:52
  • @DimitrisRizopoulos do you think comparing the variances of the BLUBs in the two conditions would be a better way to test the OP's hypothesis? – statmerkur Dec 03 '19 at 11:57

5 Answers5

12

There's more than one way to test this hypothesis. For example, the procedure outlined by @amoeba should work. But it seems to me that the simplest, most expedient way to test it is using a good old likelihood ratio test comparing two nested models. The only potentially tricky part of this approach is in knowing how to set up the pair of models so that dropping out a single parameter will cleanly test the desired hypothesis of unequal variances. Below I explain how to do that.

Short answer

Switch to contrast (sum to zero) coding for your independent variable and then do a likelihood ratio test comparing your full model to a model that forces the correlation between random slopes and random intercepts to be 0:

# switch to numeric (not factor) contrast codes
d$contrast <- 2*(d$condition == 'experimental') - 1

# reduced model without correlation parameter
mod1 <- lmer(sim_1 ~ contrast + (contrast || participant_id), data=d)

# full model with correlation parameter
mod2 <- lmer(sim_1 ~ contrast + (contrast | participant_id), data=d)

# likelihood ratio test
anova(mod1, mod2)

Visual explanation / intuition

In order for this answer to make sense, you need to have an intuitive understanding of what different values of the correlation parameter imply for the observed data. Consider the (randomly varying) subject-specific regression lines. Basically, the correlation parameter controls whether the participant regression lines "fan out to the right" (positive correlation) or "fan out to the left" (negative correlation) relative to the point $X=0$, where X is your contrast-coded independent variable. Either of these imply unequal variance in participants' conditional mean responses. This is illustrated below:

random correlation

In this plot, we ignore the multiple observations that we have for each subject in each condition and instead just plot each subject's two random means, with a line connecting them, representing that subject's random slope. (This is made up data from 10 hypothetical subjects, not the data posted in the OP.)

In the column on the left, where there's a strong negative slope-intercept correlation, the regression lines fan out to the left relative to the point $X=0$. As you can see clearly in the figure, this leads to a greater variance in the subjects' random means in condition $X=-1$ than in condition $X=1$.

The column on the right shows the reverse, mirror image of this pattern. In this case there is greater variance in the subjects' random means in condition $X=1$ than in condition $X=-1$.

The column in the middle shows what happens when the random slopes and random intercepts are uncorrelated. This means that the regression lines fan out to the left exactly as much as they fan out to the right, relative to the point $X=0$. This implies that the variances of the subjects' means in the two conditions are equal.

It's crucial here that we've used a sum-to-zero contrast coding scheme, not dummy codes (that is, not setting the groups at $X=0$ vs. $X=1$). It is only under the contrast coding scheme that we have this relationship wherein the variances are equal if and only if the slope-intercept correlation is 0. The figure below tries to build that intuition:

enter image description here

What this figure shows is the same exact dataset in both columns, but with the independent variable coded two different ways. In the column on the left we use contrast codes -- this is exactly the situation from the first figure. In the column on the right we use dummy codes. This alters the meaning of the intercepts -- now the intercepts represent the subjects' predicted responses in the control group. The bottom panel shows the consequence of this change, namely, that the slope-intercept correlation is no longer anywhere close to 0, even though the data are the same in a deep sense and the conditional variances are equal in both cases. If this still doesn't seem to make much sense, studying this previous answer of mine where I talk more about this phenomenon may help.

Proof

Let $y_{ijk}$ be the $j$th response of the $i$th subject under condition $k$. (We have only two conditions here, so $k$ is just either 1 or 2.) Then the mixed model can be written $$ y_{ijk} = \alpha_i + \beta_ix_k + e_{ijk}, $$ where $\alpha_i$ are the subjects' random intercepts and have variance $\sigma^2_\alpha$, $\beta_i$ are the subjects' random slope and have variance $\sigma^2_\beta$, $e_{ijk}$ is the observation-level error term, and $\text{cov}(\alpha_i, \beta_i)=\sigma_{\alpha\beta}$.

We wish to show that $$ \text{var}(\alpha_i + \beta_ix_1) = \text{var}(\alpha_i + \beta_ix_2) \Leftrightarrow \sigma_{\alpha\beta}=0. $$

Beginning with the left hand side of this implication, we have $$ \begin{aligned} \text{var}(\alpha_i + \beta_ix_1) &= \text{var}(\alpha_i + \beta_ix_2) \\ \sigma^2_\alpha + x^2_1\sigma^2_\beta + 2x_1\sigma_{\alpha\beta} &= \sigma^2_\alpha + x^2_2\sigma^2_\beta + 2x_2\sigma_{\alpha\beta} \\ \sigma^2_\beta(x_1^2 - x_2^2) + 2\sigma_{\alpha\beta}(x_1 - x_2) &= 0. \end{aligned} $$

Sum-to-zero contrast codes imply that $x_1 + x_2 = 0$ and $x_1^2 = x_2^2 = x^2$. Then we can further reduce the last line of the above to $$ \begin{aligned} \sigma^2_\beta(x^2 - x^2) + 2\sigma_{\alpha\beta}(x_1 + x_1) &= 0 \\ \sigma_{\alpha\beta} &= 0, \end{aligned} $$ which is what we wanted to prove. (To establish the other direction of the implication, we can just follow these same steps in reverse.)

To reiterate, this shows that if the independent variable is contrast (sum to zero) coded, then the variances of the subjects' random means in each condition are equal if and only if the correlation between random slopes and random intercepts is 0. The key take-away point from all this is that testing the null hypothesis that $\sigma_{\alpha\beta} = 0$ will test the null hypothesis of equal variances described by the OP.

This does NOT work if the independent variable is, say, dummy coded. Specifically, if we plug the values $x_1=0$ and $x_2=1$ into the equations above, we find that $$ \text{var}(\alpha_i) = \text{var}(\alpha_i + \beta_i) \Leftrightarrow \sigma_{\alpha\beta} = -\frac{\sigma^2_\beta}{2}. $$

statmerkur
  • 1,115
  • 1
  • 9
  • 26
Jake Westfall
  • 11,539
  • 2
  • 48
  • 96
  • This is already a terrific answer, thank you! I think this comes closest to answering my question, so I'm accepting it and giving you the bounty (it's about to expire), but I'd love to see an algebraic justification if you have the time and energy for it. – Patrick S. Forscher Aug 19 '18 at 06:29
  • 1
    @PatrickS.Forscher I just added a proof – Jake Westfall Aug 19 '18 at 19:15
  • +1 Great answer. Given that the real interest of @PatrickS.Forscher, as he explained in the comments, is in his Bonus situation with two crossed random terms, do you think one can apply exactly the same test in that case, simply leaving `(condition | stimulus)` term unchanged? – amoeba Aug 19 '18 at 19:26
  • Also, I am somewhat confused by the footnote number 1: you say that we can compare `(0 + dummy(condition, "control") | subject) + (0 + dummy(condition, "experimental") | subject)` with `(1 | subject)`, right? However, isn't my counterexample from another thread here showing that this is not valid? If all subjects that have responses $a$ in control condition have responses $-a$ in the experimental condition, then the 1st model will fit better than the 2nd, despite variance being equal. – amoeba Aug 19 '18 at 19:30
  • @amoeba I don't have a proof for the bonus situation, but I think since it's still the variance of the participant means we're interested in, not the stimulus means, we could still apply the same test I described....I think that's right, but I'm not positive.. – Jake Westfall Aug 19 '18 at 19:52
  • @amoeba Hmm...you may be right, but I'm not sure. (Yes that's the test I was thinking.) I think the flaw in Henrik's proposal stems from the fact it's a 2 degree-of-freedom test, where the larger model not only allows unequal variances, but also variable treatment effects, so it's really testing 2 things at once. My footnote refers to a 1 D.F. test which I was thinking avoided this problem, but I'm not positive. Since my large model just models the random means in each condition and their variances, but in your scenario the 2 variances are equal, it seems that it shouldn't fit any better, no? – Jake Westfall Aug 19 '18 at 20:01
  • 1
    @JakeWestfall In my toy example subjects have flipped responses in the two conditions. If a subject has response $a$ in condition A and $-a$ in condition B, then what would be the BLUP value of random intercept for this subject when we use `(1 | subject)` model? I think it can only be 0. If all subjects have BLUPs equal to zero, then the variance of random intercept is also zero. So this model is unable to fit this toy example at all. In contrast, the model defined above via `dummy` will have two BLUPs for each subject, and they can easily be $a$ and $-a$. Am I missing something here? – amoeba Aug 19 '18 at 20:56
  • 1
    I see now that you're right @amoeba, thanks for explaining. I'll edit my answer accordingly. – Jake Westfall Aug 19 '18 at 21:00
  • One thing that worries me about your suggestion, is that the model defined via `||` (your `mod1`) does not really *force* the correlation to be zero. AFAIK one can often have a situation where one uses a model with the correlation parameter suppressed, but the BLUPs end up being correlated, possibly even strongly. So let's imagine the real variances are very different. Your `mod1` will estimate a non-zero correlation. Your `mod2` will not estimate any correlation parameter but will end up with intercept/slope BLUPs that are correlated. Will the LR test still work as intented? – amoeba Aug 19 '18 at 21:28
  • 1
    @amoeba You're right that it's possible that the BLUPs can come out correlated even without a correlation parameter in the model. But I believe that for testing purposes the procedure still works as intended (e.g., it has the nominal type 1 error rate) because only the model with the correlation parameter is able to incorporate that into the likelihood function and thereby "receive credit" for that. That is, even if the BLUPs come out correlated in the simpler model, it's still as if the effects are uncorrelated as far as the total likelihood is concerned, so the LR test will work. I think :) – Jake Westfall Aug 19 '18 at 21:37
  • @JakeWestfall nice answer, and I think you are right with your comment regarding Henrik's approach. Restricting two parameters does test a different hypothesis (see my A). – statmerkur Nov 30 '19 at 14:26
6

You can test significance, of model parameters, with the help of estimated confidence intervals for which the lme4 package has the confint.merMod function.

bootstrapping (see for instance Confidence Interval from bootstrap)

> confint(m, method="boot", nsim=500, oldNames= FALSE)
Computing bootstrap confidence intervals ...
                                                           2.5 %     97.5 %
sd_(Intercept)|participant_id                         0.32764600 0.64763277
cor_conditionexperimental.(Intercept)|participant_id -1.00000000 1.00000000
sd_conditionexperimental|participant_id               0.02249989 0.46871800
sigma                                                 0.97933979 1.08314696
(Intercept)                                          -0.29669088 0.06169473
conditionexperimental                                 0.26539992 0.60940435 

likelihood profile (see for instance What is the relationship between profile likelihood and confidence intervals?)

> confint(m, method="profile", oldNames= FALSE)
Computing profile confidence intervals ...
                                                          2.5 %     97.5 %
sd_(Intercept)|participant_id                         0.3490878 0.66714551
cor_conditionexperimental.(Intercept)|participant_id -1.0000000 1.00000000
sd_conditionexperimental|participant_id               0.0000000 0.49076950
sigma                                                 0.9759407 1.08217870
(Intercept)                                          -0.2999380 0.07194055
conditionexperimental                                 0.2707319 0.60727448

  • There is also a method 'Wald' but this is applied to fixed effects only.

  • There also exist some kind of anova (likelihood ratio) type of expression in the package lmerTest which is named ranova. But I can not seem to make sense out of this. The distribution of the differences in logLikelihood, when the null hypothesis (zero variance for the random effect) is true is not chi-square distributed (possibly when number of participants and trials is high the likelihood ratio test might make sense).


Variance in specific groups

To obtain results for variance in specific groups you could reparameterize

# different model with alternative parameterization (and also correlation taken out) 
fml1 <- "~ condition + (0 + control + experimental || participant_id) "

Where we added two columns to the data-frame (this is only needed if you wish to evaluate non-correlated 'control' and 'experimental' the function (0 + condition || participant_id) would not lead to the evaluation of the different factors in condition as non-correlated)

#adding extra columns for control and experimental
d <- cbind(d,as.numeric(d$condition=='control'))
d <- cbind(d,1-as.numeric(d$condition=='control'))
names(d)[c(4,5)] <- c("control","experimental")

Now lmer will give variance for the different groups

> m <- lmer(paste("sim_1 ", fml1), data=d)
> m
Linear mixed model fit by REML ['lmerModLmerTest']
Formula: paste("sim_1 ", fml1)
   Data: d
REML criterion at convergence: 2408.186
Random effects:
 Groups           Name         Std.Dev.
 participant_id   control      0.4963  
 participant_id.1 experimental 0.4554  
 Residual                      1.0268  
Number of obs: 800, groups:  participant_id, 40
Fixed Effects:
          (Intercept)  conditionexperimental  
               -0.114                  0.439 

And you can apply the profile methods to these. For instance now confint gives confidence intervals for the control and exerimental variance.

> confint(m, method="profile", oldNames= FALSE)
Computing profile confidence intervals ...
                                    2.5 %     97.5 %
sd_control|participant_id       0.3490873 0.66714568
sd_experimental|participant_id  0.3106425 0.61975534
sigma                           0.9759407 1.08217872
(Intercept)                    -0.2999382 0.07194076
conditionexperimental           0.1865125 0.69149396

Simplicity

You could use the likelihood function to get more advanced comparisons, but there are many ways to make approximations along the road (e.g. you could do a conservative anova/lrt-test, but is that what you want?).

At this point it makes me wonder what is actually the point of this (not so common) comparison between variances. I wonder whether it starts to become too sophisticated. Why the difference between variances instead of the ratio between variances (which relates to the classical F-distribution)? Why not just report confidence intervals? We need to take a step back, and clarify the data and the story it is supposed to tell, before going into advanced pathways that may be superfluous and loose touch with the statistical matter and the statistical considerations that are actually the main topic.

I wonder whether one should do much more than simply stating the confidence intervals (which may actually tell much more than a hypothesis test. a hypothesis test gives a yes no answer but no information about the actual spread of the population. given enough data you can make any slight difference to be reported as a significant difference). To go more deeply into the matter (for whatever purpose), requires, I believe, a more specific (narrowly defined) research question in order to guide the mathematical machinery to make the proper simplifications (even when an exact calculation might be feasible or when it could be approximated by simulations/bootstrapping, even then in in some settings it still requires some appropriate interpretation). Compare with Fisher's exact test to solve a (particular) question (about contingency tables) exactly, but which may not be the right question.

Simple example

To provide an example of the simplicity that is possible I show below a comparison (by simulations) with a simple assessment of the difference between the two group variances based on an F-test done by comparing variances in the individual mean responses and done by comparing the mixed model derived variances.

For the F-test we simply compare the variance of the values (means) of the individuals in the two groups. Those means are for condition $j$ distributed as:

$$\hat{Y}_{i,j} \sim N(\mu_j, \sigma_j^2 + \frac{\sigma_{\epsilon}^2}{10})$$

if the measurement error variance $\sigma_\epsilon$ is equal for all individuals and conditions, and if the variance for the two conditions $\sigma_{j}$ (with $j = \lbrace 1,2 \rbrace$) is equal then the ratio for the variance for the 40 means in the condition 1 and the variance for the 40 means in the condition 2 is distributed according to the F-distribution with degrees of freedom 39 and 39 for numerator and denominator.

You can see this in the simulation of the below graph where aside for the F-score based on sample means an F-score is calculated based on the predicted variances (or sums of squared error) from the model.

example difference in exactness

The image is modeled with 10 000 repetitions using $\sigma_{j=1} = \sigma_{j=2} = 0.5$ and $\sigma_\epsilon=1$.

You can see that there is some difference. This difference may be due to fact that the mixed effects linear model is obtaining the sums of squared error (for the random effect) in a different way. And these squared error terms are not (anymore) well expressed as a simple Chi-squared distribution, but still closely related and they can be approximated.

Aside from the (small) difference when the null-hypothesis is true, more interesting is the case when the null hypothesis is not true. Especially the condition when $\sigma_{j=1} \neq \sigma_{j=2}$. The distribution of the means $\hat{Y}_{i,j}$ are not only dependent on those $\sigma_j$ but also on the measurement error $\sigma_\epsilon$. In the case of the mixed effects model this latter error is 'filtered out', and it is expected that the F-score based on the random effects model variances has a higher power.

example difference in power

The image is modeled with 10 000 repetitions using $\sigma_{j=1} = 0.5$, $\sigma_{j=2} = 0.25$ and $\sigma_\epsilon=1$.

So the model based on the means is very exact. But it is less powerful. This shows that the correct strategy depends on what you want/need.

In the example above when you set the right tail boundaries at 2.1 and 3.1 you get approximately 1% of the population in the case of equal variance (resp 103 and 104 of the 10 000 cases) but in the case of unequal variance these boundaries differ a lot (giving 5334 and 6716 of the cases)

code:

set.seed(23432)

# different model with alternative parameterization (and also correlation taken out)
fml1 <- "~ condition + (0 + control + experimental || participant_id) "
fml <- "~ condition + (condition | participant_id)"

n <- 10000

theta_m <- matrix(rep(0,n*2),n)
theta_f <- matrix(rep(0,n*2),n)

# initial data frame later changed into d by adding a sixth sim_1 column
ds <- expand.grid(participant_id=1:40, trial_num=1:10)
ds <- rbind(cbind(ds, condition="control"), cbind(ds, condition="experimental"))
  #adding extra columns for control and experimental
  ds <- cbind(ds,as.numeric(ds$condition=='control'))
  ds <- cbind(ds,1-as.numeric(ds$condition=='control'))
  names(ds)[c(4,5)] <- c("control","experimental")

# defining variances for the population of individual means
stdevs <- c(0.5,0.5) # c(control,experimental)

pb <- txtProgressBar(title = "progress bar", min = 0,
                    max = n, style=3)
for (i in 1:n) {

  indv_means <- c(rep(0,40)+rnorm(40,0,stdevs[1]),rep(0.5,40)+rnorm(40,0,stdevs[2]))
  fill <- indv_means[d[,1]+d[,5]*40]+rnorm(80*10,0,sqrt(1)) #using a different way to make the data because the simulate is not creating independent data in the two groups 
  #fill <- suppressMessages(simulate(formula(fml), 
  #                     newparams=list(beta=c(0, .5), 
  #                                    theta=c(.5, 0, 0), 
  #                                    sigma=1), 
  #                     family=gaussian, 
  #                     newdata=ds))
  d <- cbind(ds, fill)
  names(d)[6] <- c("sim_1")


  m <- lmer(paste("sim_1 ", fml1), data=d)
  m
  theta_m[i,] <- m@theta^2

  imeans <- aggregate(d[, 6], list(d[,c(1)],d[,c(3)]), mean)
  theta_f[i,1] <- var(imeans[c(1:40),3])
  theta_f[i,2] <- var(imeans[c(41:80),3])

  setTxtProgressBar(pb, i)
}
close(pb)

p1 <- hist(theta_f[,1]/theta_f[,2], breaks = seq(0,6,0.06))       
fr <- theta_m[,1]/theta_m[,2]
fr <- fr[which(fr<30)]
p2 <- hist(fr, breaks = seq(0,30,0.06))



plot(-100,-100, xlim=c(0,6), ylim=c(0,800), 
     xlab="F-score", ylab = "counts [n out of 10 000]")
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,6), ylim=c(0,800), add=T)  # means based F-score
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,6), ylim=c(0,800), add=T)  # model based F-score
fr <- seq(0, 4, 0.01)
lines(fr,df(fr,39,39)*n*0.06,col=1)
legend(2, 800, c("means based F-score","mixed regression based F-score"), 
       fill=c(rgb(0,0,1,1/4),rgb(1,0,0,1/4)),box.col =NA, bg = NA)
legend(2, 760, c("F(39,39) distribution"), 
       lty=c(1),box.col = NA,bg = NA)
title(expression(paste(sigma[1]==0.5, " , ", sigma[2]==0.5, " and ", sigma[epsilon]==1)))
Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • That's useful but does not seem to address the question about how to compare variances in two conditions. – amoeba Aug 13 '18 at 13:11
  • @amoeba I found that this answer gives the core of the issue (about testing the random variance components). What the OP precisely wants is difficult to read in the entire text. What does "the random intercept variances" refer to? (the plural in relation to intercept confuses me) One possible case might be to use the model `sim_1 ~ condition + (0 + condition | participant_id)"` in which case you get a parameterization into two parameters (one for each group) rather than two parameters one for the intercept and one for the effect (which need to be combined for the groups). – Sextus Empiricus Aug 13 '18 at 13:27
  • Each subject has some mean response in condition A and some mean response in condition B. The question is whether the variance across subjects in A is different from the variance across subjects in B. – amoeba Aug 13 '18 at 13:38
  • This does not complete the task posed in the title "Compare random variance component across levels of a grouping variable". I noticed that there was a confusing typo in the body of the question, which I've fixed. I also tried to further clarify the wording of the question. – Patrick S. Forscher Aug 13 '18 at 14:18
  • It might be possible to answer the question using `car::linearHypothesisTest` (http://math.furman.edu/~dcs/courses/math47/R/library/car/html/linear.hypothesis.html), which allows the user to test arbitrary hypotheses with a fitted model. However, I'd have to use @amoeba's method to obtain both random intercepts in the same model fitted model so they can be compared with this function. I'm also a little uncertain as to the validity of the method. – Patrick S. Forscher Aug 13 '18 at 14:24
  • You could plot the likelihood profile for the random error variance for the cases condition A and condition B. Then see whether the curve of equal variance falls outside the desired confidence region. I will add that to the answer. – Sextus Empiricus Aug 13 '18 at 14:28
  • What you now suggest with `0 + control + experimental` is exactly what I wrote in the comments as `(0 + dummy(condition, "control") | participant_id) + (0 + dummy(condition, "experimental") | participant_id)`, right? – amoeba Aug 13 '18 at 15:39
  • @amoeba Yes indeed. – Sextus Empiricus Aug 13 '18 at 15:42
  • Thanks, the edit is definitely helpful! However, I this answer is still somewhat incomplete because it does not test the hypothesis `sd_control|participant_id - sd_experimental|participant_id = 0`. I would also like to know something about the validity of the suggested technique. – Patrick S. Forscher Aug 13 '18 at 16:04
  • My personal goal is very pragmatic: I've been asked to test (a more complex version of) this question by a reviewer and I wasn't certain how to even approach the issue. If an F-ratio is a more appropriate approach I'm certainly willing to do it. Mostly I want to know (1) that whatever I'm doing answers the reviewer's question and (2) that the approach has some principled theoretical basis – Patrick S. Forscher Aug 13 '18 at 16:51
  • I had displayed a graph of the (estimated) participant intercepts, and there seemed to be more spread in those intercepts in one condition than another. A reviewer asked out of curiosity whether this was true (beyond what one would expect due to sampling error). In this case there is no deeper reason than that. – Patrick S. Forscher Aug 13 '18 at 17:20
  • I'll think about your point -- it is a reasonable one. I'm going to hold off on accepting this question or giving it a bounty for the time being, but this is a useful answer and contribution. – Patrick S. Forscher Aug 13 '18 at 19:36
5

One relatively straight-forward way could be to use likelihood-ratio tests via anova as described in the lme4 FAQ.

We start with a full model in which the variances are unconstrained (i.e., two different variances are allowed) and then fit one constrained model in which the two variances are assumed to be equal. We simply compare them with anova() (note that I set REML = FALSE although REML = TRUE with anova(..., refit = FALSE) is completely feasible).

m_full <- lmer(sim_1 ~ condition + (condition | participant_id), data=d, REML = FALSE)
summary(m_full)$varcor
 # Groups         Name                  Std.Dev. Corr  
 # participant_id (Intercept)           0.48741        
 #                conditionexperimental 0.26468  -0.419
 # Residual                             1.02677     

m_red <- lmer(sim_1 ~ condition + (1 | participant_id), data=d, REML = FALSE)
summary(m_red)$varcor
 # Groups         Name        Std.Dev.
 # participant_id (Intercept) 0.44734 
 # Residual                   1.03571 

anova(m_full, m_red)
# Data: d
# Models:
# m_red: sim_1 ~ condition + (1 | participant_id)
# m_full: sim_1 ~ condition + (condition | participant_id)
#        Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
# m_red   4 2396.6 2415.3 -1194.3   2388.6                         
# m_full  6 2398.7 2426.8 -1193.3   2386.7 1.9037      2      0.386

However, this test is likely conservative. For example, the FAQ says:

Keep in mind that LRT-based null hypothesis tests are conservative when the null value (such as σ2=0) is on the boundary of the feasible space; in the simplest case (single random effect variance), the p-value is approximately twice as large as it should be (Pinheiro and Bates 2000).

There are several alternatives:

  1. Create an appropriate test distribution, which usually consists of a mixture of $\chi^2$ distributions. See e.g.,
    Self, S. G., & Liang, K.-Y. (1987). Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests Under Nonstandard Conditions. Journal of the American Statistical Association, 82(398), 605. https://doi.org/10.2307/2289471 However, this is quite complicated.

  2. Simulate the correct distribution using RLRsim (as also described in the FAQ).

I will demonstrate the second option in the following:

library("RLRsim")
## reparametrize model so we can get one parameter that we want to be zero:
afex::set_sum_contrasts() ## warning, changes contrasts globally
d <- cbind(d, difference = model.matrix(~condition, d)[,"condition1"])

m_full2 <- lmer(sim_1 ~ condition + (difference | participant_id), data=d, REML = FALSE)
all.equal(deviance(m_full), deviance(m_full2))  ## both full models are identical

## however, we need the full model without correlation!
m_full2b <- lmer(sim_1 ~ condition + (1| participant_id) + 
                   (0 + difference | participant_id), data=d, REML = FALSE)
summary(m_full2b)$varcor
 # Groups           Name        Std.Dev.
 # participant_id   (Intercept) 0.44837 
 # participant_id.1 difference  0.13234 
 # Residual                     1.02677 

## model that only has random effect to be tested
m_red <- update(m_full2b,  . ~ . - (1 | participant_id), data=d, REML = FALSE)
summary(m_red)$varcor
 # Groups         Name       Std.Dev.
 # participant_id difference 0.083262
 # Residual                  1.125116

## Null model 
m_null <- update(m_full2b,  . ~ . - (0 + difference | participant_id), data=d, REML = FALSE)
summary(m_null)$varcor
 # Groups         Name        Std.Dev.
 # participant_id (Intercept) 0.44734 
 # Residual                   1.03571 

exactRLRT(m_red, m_full2b, m_null)
# Using restricted likelihood evaluated at ML estimators.
# Refit with method="REML" for exact results.
# 
#   simulated finite sample distribution of RLRT.
#   
#   (p-value based on 10000 simulated values)
# 
# data:  
# RLRT = 1.9698, p-value = 0.0719

As we can see, the output suggests that with REML = TRUE we would have gotten exact results. But this is left as an exercise to the reader.

Regarding the bonus, I am not sure if RLRsim allows simultaneous testing of multiple components, but if so, this can be done in the same way.


Response to comment:

So it is true, then, that in general the random slope $\theta_X$ allows the random intercept $\theta_0$ to vary across levels of $X$?

I am not sure this question can receive a reasonable answer.

  • A random intercept allows an idiosyncratic difference in the overall level for each level of the grouping factor. For example, if the dependent variable is response time, some participants are faster and some are slower.
  • A random slope allows each level of the grouping factor an idiosyncratic effect of the factor for which random slopes are estimated. For example, if the factor is congruency, then some participants can have a higher congruency effect than others.

So do random-slopes affect the random-intercept? In some sense this might make sense, as they allow each level of the grouping factor a completely idiosyncratic effect for each condition. In the end, we estimate two idiosyncratic parameters for two conditions. However, I think the distinction between the overall level captured by the intercept and the condition specific effect captured by the random slope is a important and then the random slope cannot really affect the random intercept. However, it still allows each level of the grouping factor an idiosyncratic separately for each level of the condition.

Nevertheless, my test still does what the original question wants. It tests whether the difference in variances between the two conditions is zero. If it is zero, then the variances in both conditions are equal. In other words, only if there is no need for a random-slope is the variance in both conditions identical. I hope that makes sense.

Henrik
  • 13,314
  • 9
  • 63
  • 123
  • Thanks (+1)! Does adding `(condition | participant_id)` (as opposed to `(1 | participant_id)` ) allow the participant intercept variances to vary across levels of the `condition` variable? I was not aware of this interpretation! So in a sense my question is about whether the random effect for `condition` is non-zero -- is this right? – Patrick S. Forscher Aug 13 '18 at 14:55
  • 1
    You use treatment contrasts (`contr.treatment`) for which the control condition is the reference (i.e., for which the random intercept is calculated). The parametrization I propose I use sum contrasts (i.e., `contr.sum`) and the intercept is the grand mean. I feel like it makes more sense to test whether the difference is null when the intercept is the grand mean instead of the control condition (but writing it done suggests it might be relatively inconsequential). You might want to read pp. 24 to 26 of: http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf – Henrik Aug 13 '18 at 15:05
  • 1
    Thanks! My questions are slightly different, though: (1) Your answer seems to imply that my question reduces to "is the random slope for condition different from 0". Is this true? (2) If the answer to (1) is "yes", this suggests another interpretation of the random slope for `condition`: it allows the random intercept to vary across levels of `condition`. Is this true? – Patrick S. Forscher Aug 13 '18 at 15:26
  • 1
    You say "I would like to statistically compare the size of the by-participant random intercept variance across the groups defined by condition". This is statistically equivalent to testing whether the parameter estimating the difference between the two conditions is zero. This is what is done here so answers your question. Re: your comment. The interpretation of a random component critically depends on the parametrization of the model. It simply gives each level of the grouping factor (i.e., `participant_id`) an idiosyncratic offset for the corresponding parameter. – Henrik Aug 13 '18 at 16:28
  • Interesting. So it is true, then, that in general the random slope $\theta_X$ allows the random intercept $\theta_0$ to vary across levels of $X$? Do you have a reference you could recommend where I could read about this? This is a revelation for me! – Patrick S. Forscher Aug 13 '18 at 16:41
  • I think there is some potential confusion in the comments here. What we have in this example is essentially a bunch of subjects with some responses in condition A and some in condition B. It's almost a paired t-test situation. @Patrick is asking whether the across-subject variance in A is the same as in B. That's the question of heterogeneity. Now, imagine that the grand means in A and B are both 0 but for each subject with response $a$ in A, the response is $-a$ in B. Then the variances are equal, but the random slope var (in the random intercept+slope model) will be non-zero. – amoeba Aug 13 '18 at 17:02
  • I am not sure I understand what exactly you are suggesting, so I think it is safer to answer no. More generally, you need to understand the parametrization of your model (i.e., how the categorcial variables are transformed into numerical variables) and what the random-effects do to those parameters. Our chapter which I have linked above explains both in more details, especially the random-effects part. – Henrik Aug 13 '18 at 17:03
  • 1
    If the answer to my question is "no", then I'm not sure your answer addresses my question. See @amoeba's comment -- I want to know whether the variance in participants's average responses are more variable in one condition than another. – Patrick S. Forscher Aug 13 '18 at 17:27
  • 1
    @PatrickS.Forscher Please see my edit. – Henrik Aug 13 '18 at 18:04
  • I disagree with this sentence: "In other words, only if there is no need for a random-slope is the variance in both conditions identical." Specifically I disagree with the "only". If there is no need for a random slope then the variance in both conditions is identical, yes. But my comment above gives an example when random slope *is* needed but the variances are nevertheless identical. – amoeba Aug 13 '18 at 18:20
  • @amoeba No, what you are saying is wrong, because it does not consider how these values arise. A model can only allow participant $p_1$ value $-x$ in condition $A$ and $+x$ in condition $B$ and at the same time participants $p_2$ value $+x$ in condition $A$ and $-x$ in condition $B$ with a random-slope. A random-intercept model alone cannot do that. It could still be that the magnitude of the random-slope and intercept are identical, but the question of whether or not the variability in each condition is identical is sufficiently captured by the presence or absence of the random slope. – Henrik Aug 13 '18 at 18:26
  • Hmm. No, I still disagree with this. For concreteness, let me give an explicit toy example. There are six subjects; values for condition A are 1,2,3,4,5,6 and values for condition B (in the same order) are 6,5,4,3,2,1. If we fit a `y ~ condition + (condition | id)` model, there will be some non-zero random slope variance. This model will fit much better than `y ~ condition + (1 | id)` (which will not be able to fit these data at all). *However*, the variance across subjects in condition A is identical to the variance across subjects in condition B. – amoeba Aug 13 '18 at 19:33
  • Given the edit, I'm still having trouble understanding how your solution addresses my question. When you remove the `condition` from `(condition | participant_id)` to obtain `(1 | participant_id)`, you remove the random slope for condition (i.e., you allow idiosyncratic differences in the condition effect for each participant, to use your terminology). Unless removing the random slope somehow constrains the random by-participant intercepts to be equal, your LRT tests whether these slopes are non-zero, not whether the random intercepts are similar across conditions. – Patrick S. Forscher Aug 13 '18 at 19:33
  • 2
    My 2¢: @amoeba 's counterexample to Henrik's proposed procedure is correct. Henrik is _nearly_ correct, but he compares the wrong pair of models. The model comparison that answer's Patrick's question is the comparison between the models Henrik called `m_full` vs. `m_full2b`. That is: the variances of participants' conditional mean responses in A vs. B are unequal ***iff*** the random intercept-slope correlation is nonzero---importantly, _under the sum-to-zero contrast coding parameterization_. Testing the random slope variance is not necessary. Trying to think how to explain this succinctly... – Jake Westfall Aug 14 '18 at 02:54
  • 2
    This is not really a proper explanation, but studying [my answer here](https://stats.stackexchange.com/questions/337158/why-does-treatment-coding-result-in-a-correlation-between-random-slope-and-inter/341566#341566) may shed a little light on the matter. Basically, the correlation parameter controls whether the participant regression lines "fan out to the right" (positive corr.) or "fan out to the left" (negative corr.). Either of these imply unequal variance in participants' conditional mean responses. Sum-to-zero coding then ensures that we're looking for correlation at the right point on X – Jake Westfall Aug 14 '18 at 03:13
  • 1
    One last comment to clarify my cryptic remark about "the right point on X." Following on from the above, if the random slope-intercept correlation is 0, it means that ***relative to the point X=0*** the participant regression lines fan out to the right exactly as much as they fan out to the left. (Random slopes just imply fanning happens.) So then if we code the groups so that X=0 corresponds to halfway between the two groups -- i.e., we use sum-to-zero contrast coding -- then equal fanning out from that point translates to equal variance in participant conditional means in the two conditions. – Jake Westfall Aug 14 '18 at 03:33
  • When in doubt @JakeWestfall is probably right and I am probably wrong (as appears to be the case here). So if this is the test that needs to be done (i.e., for the correlation), one needs to figure out how to test it. As 0 is not at the boundary of the parameter space for the correlation, `anova()` between models with and without (i.e., `m_full` and `m_full2b`) is probably enough and not anticoservative. One could nevertheless try to use `RLRsim`, might still be more accurate. – Henrik Aug 14 '18 at 09:12
  • 1
    @JakeWestfall I'm glad we agree about my counterexample but I don't really follow the rest of your comments. Consider posting an answer in this thread! I was thinking that a model `y ~ 0 + condition + (0 + condition | id)` explicitly models conditional variance in condition A, cond. variance in condition B, and a correlation parameter. So using that, one can do a permutation test for variance difference being equal to zero. However, I am not sure how to specify a model that would constrain variances in A and B to be the same, while still allowing for an arbitrary correlation. – amoeba Aug 14 '18 at 11:12
  • 2
    I will consider posting an answer with pictures if I can find the time... – Jake Westfall Aug 14 '18 at 13:39
  • I support that @JakeWestfall is right: one has to use sum contrasts and test if (1) dropping the correlation parameter significantly decreases the model fit or (2) one can introduce another variance component (namely one for participant-condition-combination intercepts) and test if a model with that random effects structure fits significantly worse than the full model (see my A). – statmerkur Nov 30 '19 at 14:37
5

Your model

m = lmer(sim_1 ~ condition + (condition | participant_id), data=d)

already allows the across-subject variance in the control condition to differ from the across-subject variance in the experimental condition. This can be made more explicit by an equivalent re-parametrization:

m = lmer(sim_1 ~ 0 + condition + (0 + condition | participant_id), data=d)

The random covariance matrix now has a simpler interpretation:

Random effects:
 Groups         Name                  Variance Std.Dev. Corr
 participant_id conditioncontrol      0.2464   0.4963       
                conditionexperimental 0.2074   0.4554   0.83

Here the two variances are precisely the two variances you are interested in: the [across-subjects] variance of conditional mean responses in the control condition and the same in the experimental condition. In your simulated dataset, they are 0.25 and 0.21. The difference is given by

delta = as.data.frame(VarCorr(m))[1,4] - as.data.frame(VarCorr(m))[2,4]

and is equal to 0.039. You want to test if it is significantly different from zero.

EDIT: I realized that the permutation test that I describe below is incorrect; it won't work as intended if the means in control/experimental condition are not the same (because then the observations are not exchangeable under the null). It might be a better idea to bootstrap subjects (or subjects/items in the Bonus case) and obtain the confidence interval for delta.

I will try to fix the code below to do that.


Original permutation-based suggestion (wrong)

I often find that one can save oneself a lot of trouble by doing a permutation test. Indeed, in this case it is very easy to set up. Let's permute control/experimental conditions for each subject separately; then any difference in variances should be eliminated. Repeating this many times will yield the null distribution for the differences.

(I do not program in R; everybody please feel free to re-write the following in a better R style.)

set.seed(42)
nrep = 100
v = matrix(nrow=nrep, ncol=1)
for (i in 1:nrep)
{
   dp = d
   for (s in unique(d$participant_id)){             
     if (rbinom(1,1,.5)==1){
       dp[p$participant_id==s & d$condition=='control',]$condition = 'experimental'
       dp[p$participant_id==s & d$condition=='experimental',]$condition = 'control'
     }
   }
  m <- lmer(sim_1 ~ 0 + condition + (0 + condition | participant_id), data=dp)
  v[i,] = as.data.frame(VarCorr(m))[1,4] - as.data.frame(VarCorr(m))[2,4]
}
pvalue = sum(abs(v) >= abs(delta)) / nrep

Running this yields the p-value $p=0.7$. One can increase nrep to 1000 or so.

Exactly the same logic can be applied in your Bonus case.

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • Super interesting, thank you! I'll have to think more about why your reparameterization works, as this seems to be the key insight of this answer. – Patrick S. Forscher Aug 15 '18 at 02:21
  • Strangely, the per-group intercept values in your answer seem to differ from those in @MartijnWeterings' answer. – Patrick S. Forscher Aug 15 '18 at 02:24
  • @PatrickS.Forscher That's because he, I think, generates a different dataset. I can use `sim_1 ~ 0 + condition + (0 + dummy(condition, "control") + dummy(condition, "experimental") | participant_id)` formulation and get the same outcome as in my answer. – amoeba Aug 15 '18 at 06:25
  • Ah, I see, you set the seed to `42` instead of the `23432` I used in my question. – Patrick S. Forscher Aug 15 '18 at 14:18
  • 1
    @PatrickS.Forscher No, I used the data generated by your code (with your seed). I set the seed to 42 only when performing the permutation testing. It's Martijn who changed the dataset, not me. – amoeba Aug 15 '18 at 14:21
  • Shall I explain why the reparametrizations works in more detail? Please note that the original parameterization *already* allows different variances in control and experimental conditions. (This was, by the way, not immediately clear to me and I might have written somewhat misleading comments under your Q in the beginning.) It's just that with the original parametrization it's not so easy to extract the value of `delta`. When I suppress the intercept with `0 + ` in the formula, it becomes very easy to get the `delta`. – amoeba Aug 15 '18 at 19:38
  • My data is indeed different. That is because in the original code the model (which has now one parameter less) is used to define the data. But I could have had (and did at certain points) used the old data (that's why I named the model fml1 in order to still use the old fml to build the data). I chose to make data based on different true variances (which for this particular seed are actually not so different). So yes, the principle is the same and the results are the same when you perform my code with your data. – Sextus Empiricus Aug 15 '18 at 20:17
  • amoeba, yes, a bit of explanation about that reparameterization would be helpful if you have the time! – Patrick S. Forscher Aug 16 '18 at 22:07
  • @PatrickS.Forscher I thought I already did: see the beginning of my answer. HOWEVER, I realized that my permutation scheme is actually incorrect. If the means in control/experimental conditions are not the same, then it's a very poor test for equality of variances. I think one can fix that by permuting the deviations from the mean. Alternatively, one can bootstrap the subjects and obtain the confidence interval for the difference in variances. I think that might be a better approach. I will try to edit my answer accordingly when I have some time. – amoeba Aug 17 '18 at 19:55
  • 1
    This proposal is definitely sound. As I think you've already experienced, setting up permutation tests for multilevel data is not entirely straightforward. A similar approach that would be a bit easier to implement would be parametric bootstrapping, which is pretty simple to do with lme4 using the simulate() method of the fitted lmer objects, i.e., call simulate(m) many times to build up the bootstrap distribution. Just an idea to play around with. – Jake Westfall Aug 20 '18 at 14:33
0

Looking at this problem from a slightly different perspective and starting from the "general" form of the linear mixed model, we have $$ y_{ijk} = \mu + \alpha_j + d_{ij} + e_{ijk}, \quad d_{i} \sim N(0, \Sigma), \quad e_{ijk} \sim N(0, \sigma^2) $$ where $\alpha_j$ is the fixed effect of the $j$'th condition and $d_i = (d_{i1}, \ldots, d_{iJ})^\top$ is a random vector (some call it vector-valued random effect, I think) for the $i$'th participant in the $j$'th condition.
In your example we have two conditions $y_{i1k}$ and $y_{i2k}$ which I'll denote as $A$ and $B$ in what follows. So the covariance matrix of the two-dimensional random vector $d_{i}$ is of the general form

$$\Sigma = \begin{bmatrix}\sigma^2_A & \sigma_{AB}\\ \sigma_{AB} & \sigma^2_{B} \end{bmatrix} $$

with non-negative $\sigma^2_A$ and $\sigma^2_{B}$.

Let's first see how the re-parameterized version of $\Sigma$ looks when we use sum contrasts.

The variance of the intercept, which corresponds to the grand mean, is

$$ \sigma^2_{1} := \text{Var(grand mean)} = \text{Var}(\frac{1}{2} \cdot (A+B)) = \frac{1}{4} \cdot (\text{Var}(A) + \text{Var}(B) + 2 \cdot \text{Cov}(A,B)). $$

The variance of the contrast is

$$ \sigma^2_{2} := \text{Var(contrast)} = \text{Var}(\frac{1}{2} \cdot (A-B)) = \frac{1}{4} \cdot (\text{Var}(A) + \text{Var}(B) - 2 \cdot \text{Cov}(A,B)). $$

And the covariance between the intercept and the contrast is

$$ \sigma_{12} := \text{Cov}(\text{grand mean, contrast}) = \text{Cov}(\frac{1}{2} \cdot (A+B), \frac{1}{2} \cdot (A-B)) \\ = \frac{1}{4} \cdot (\text{Var}(A) - \text{Var}(B)). $$

Thus, the re-parameterized $\Sigma$ is

$$\Sigma = \begin{bmatrix}\sigma^2_1 + \sigma^2_2 + 2\sigma_{12} & \sigma^2_1 - \sigma^2_2\\ \sigma^2_1 - \sigma^2_2 & \sigma^2_1 + \sigma^2_2 - 2\sigma_{12} \end{bmatrix} = \begin{bmatrix}\sigma^2_A & \sigma_{AB}\\ \sigma_{AB} & \sigma^2_{B} \end{bmatrix}. $$

$\Sigma$ can be decomposed into

$$ \Sigma = \begin{bmatrix}\sigma^2_1 & \sigma^2_1\\ \sigma^2_1 & \sigma^2_1\end{bmatrix} + \begin{bmatrix}\sigma^2_2 & -\sigma^2_2\\ -\sigma^2_2 & \sigma^2_2\end{bmatrix} + 2\begin{bmatrix}\sigma_{12} & 0\\ 0 & -\sigma_{12}\end{bmatrix}. $$

Setting the covariance parameter $\sigma_{12}$ to zero we get

$$ \Sigma = \begin{bmatrix}\sigma^2_1 & \sigma^2_1\\ \sigma^2_1 & \sigma^2_1\end{bmatrix} + \begin{bmatrix}\sigma^2_2 & -\sigma^2_2\\ -\sigma^2_2 & \sigma^2_2\end{bmatrix} = \begin{bmatrix}\sigma^2_1 + \sigma^2_2 & \sigma^2_1 - \sigma^2_2 \\ \sigma^2_1 - \sigma^2_2 & \sigma^2_1 + \sigma^2_2\end{bmatrix} $$

which, as @Jake Westfall derived slightly differently, tests the hypothesis of equal variances when we compare a model without this covariance parameter to a model where the covariance parameter is still included/not set to zero.

Notably, introducing another crossed random grouping factor (such as stimuli) does not change the model comparison that has to be done, i.e., anova(mod1, mod2) (optionally with the argument refit = FALSE when you use REML estimation) where mod1 and mod2 are defined as @Jake Westfall did.

Taking out $\sigma_{12}$ and the variance component for the contrast $\sigma^2_2$ (what @Henrik suggests) results in

$$ \Sigma = \begin{bmatrix}\sigma^2_1 & \sigma^2_1\\ \sigma^2_1 & \sigma^2_1\end{bmatrix} $$

which tests the hypothesis that the variances in the two conditions are equal and that they are equal to the (positive) covariance between the two conditions.


When we have two conditions, a model that fits a covariance matrix with two parameters in a (positive) compound symmetric structure can also be written as

# code snippet from Jake Westfall
d$contrast <- 2*(d$condition == 'experimental') - 1

# new model
mod3 <- lmer(sim_1 ~ contrast + (1 | participant_id) + (1 | contrast:participant_id), 
             data = d, REML = FALSE) 

or (using the categorical variable/factor condition)

mod4 <- lmer(sim_1 ~ condition + (1 | participant_id) + (1 | condition:participant_id), 
             data = d, REML = FALSE)

with

$$ \Sigma = \begin{bmatrix}\sigma^2_1 + \sigma^2_2 & \sigma^2_1\\ \sigma^2_1 & \sigma^2_1 + \sigma^2_2 \end{bmatrix} = \begin{bmatrix}\sigma^2_1 & \sigma^2_1\\ \sigma^2_1 & \sigma^2_1\end{bmatrix} + \begin{bmatrix}\sigma^2_2 & 0\\ 0 & \sigma^2_2\end{bmatrix} $$

where $\sigma^2_1$ and $\sigma^2_2$ are the variance parameters for the participant and the participant-condition-combination intercepts, respectively. Note that this $\Sigma$ has a non-negative covariance parameter.

Below we see that mod1, mod3, and mod4 yield equivalent fits:

# code snippet from Jake Westfall
d$contrast <- 2*(d$condition == 'experimental') - 1

mod1 <- lmer(sim_1 ~ contrast + (contrast || participant_id),
             data = d, REML = FALSE)

mod2 <- lmer(sim_1 ~ contrast + (contrast | participant_id),
             data = d, REML = FALSE)

# new models 
mod3 <- lmer(sim_1 ~ contrast + (1 | participant_id) + (1 | contrast:participant_id), 
             data = d, REML = FALSE) 

mod4 <- lmer(sim_1 ~ condition + (1 | participant_id) + (1 | condition:participant_id), 
             data = d, REML = FALSE)

anova(mod3, mod1)
# Data: d
# Models:
# mod3: sim_1 ~ contrast + (1 | participant_id) + (1 | contrast:participant_id)
# mod1: sim_1 ~ contrast + ((1 | participant_id) + (0 + contrast | participant_id))
#      Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
# mod3  5 2396.9 2420.3 -1193.5   2386.9                        
# mod1  5 2396.9 2420.3 -1193.5   2386.9     0      0          1

anova(mod4, mod3)
# Data: d
# Models:
# mod4: sim_1 ~ condition + (1 | participant_id) + (1 | condition:participant_id)
# mod3: sim_1 ~ contrast + (1 | participant_id) + (1 | contrast:participant_id)
#      Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
# mod4  5 2396.9 2420.3 -1193.5   2386.9                        
# mod3  5 2396.9 2420.3 -1193.5   2386.9     0      0          1

With treatment contrasts (the default in R) the re-parameterized $\Sigma$ is

$$ \Sigma = \begin{bmatrix}\sigma^2_1 & \sigma^2_1 + \sigma_{12}\\ \sigma^2_1 + \sigma_{12} & \sigma^2_1 + \sigma^2_2 + 2\sigma_{12} \end{bmatrix} = \begin{bmatrix}\sigma^2_1 & \sigma^2_1\\ \sigma^2_1 & \sigma^2_1\end{bmatrix} + \begin{bmatrix}0 & 0\\ 0 & \sigma^2_2\end{bmatrix} + \begin{bmatrix}0 & \sigma_{12}\\ \sigma_{12} & 2\sigma_{12}\end{bmatrix} $$

where $\sigma^2_1$ is the variance parameter for the intercept (condition $A$), $\sigma^2_2$ the variance parameter for the contrast ($A - B$), and $\sigma_{12}$ the corresponding covariance parameter.

We can see that neither setting $\sigma_{12}$ to zero nor setting $\sigma^2_2$ to zero tests (only) the hypothesis of equal variances.

However, as shown above, we can still use mod4 to test the hypothesis as changing the contrasts has no impact on the parameterization of $\Sigma$ for this model.

statmerkur
  • 1,115
  • 1
  • 9
  • 26