Is it appropriate to include all factors as by-subject random slopes in a between subjects design?

Question

I'm trying to understand the appropriate "maximal" structure for a mixed effects model, when there are interactions and a between-subjects design.

Let's start simple. Say you have an outcome variable, and one treatment and one control group, and you want to know if the groups are different. So you're looking for a main effect of group and use outcome ~ group + (1|subject).

Now say your outcome variable is systematically affected by some other known variable, and the goal of the treatment is to prevent/reverse this effect. You have multiple measurements per subject at each level of this affecting variable. So you're looking for an interaction between group and the affecting variable, and you use output ~ group * affecting_var + (affecting_var|subject).

Finally let's say that the goal of the treatment is to prevent/reverse the effect of the affecting variable, but that the treatment takes effect gradually over the course of the measurements. So you're looking for an interaction between group, the affecting variable, and time. Do you use output ~ group * affecting_var * time | (affecting_var * time|subject)? Or should the random effects structure simply be (affecting_var|subject)? Or if it depends, what does it depend on?

My thoughts: Since it's a between-subjects design, if what I'm interested in is a group*affecting_var*time interaction, my instinct is that putting (affecting_var * time|subject) in the random effects structure is going to factor out all the variation I'm interested in. In the second situation above, it's clear to me that a random slope of affecting_var is needed, because this is a known effect and effect sizes across people vary. But time is only of interest because the treatment takes effect gradually; if not for the treatments I would not expect any effects or interactions with time. But it seems to me the default in the "keep it maximal" camp is to put every possible variable and interaction in the random effects structure (if its inclusion is supported by an ANOVA), and it raises suspicions when I don't.

EdM · Accepted Answer · 2022-02-01T15:54:35.020

I don't know that there's an appropriate answer to this question in general.

The more interactions you add to a model, the more coefficients you have to estimate and the higher risk that you will lose power as a result. Including interaction terms in random effects also means assuming (perhaps unconsciously) a correlation structure among the random effects; see this thread for example.

This question sounds like it's been written from the perspective of analyzing a study that's already been performed. In that situation you might find that you don't have enough data to estimate all of the interactions and random effects of interest.

It's better to start with these considerations during the study design phase. Decide on which particular interactions you want to evaluate as fixed or random effects, then design a study of adequate scope to accomplish that.

In response to comment

Quoting from the question:

In the second situation above, it's clear to me that a random slope of affecting_var is needed, because this is a known effect and effect sizes across people vary. But time is only of interest because the treatment takes effect gradually; if not for the treatments I would not expect any effects or interactions with time.

Your argument that a "random slope of affecting_var is needed" (emphasis added) could just as easily be applied to the affecting_var*time interaction, if different individuals respond differently over time to affecting_var. If that type of variation among individuals is of interest, then that's a way to model it. The danger, depending on the model and the data, is that you might end up with a random effect having as many unique values as there are observations and thus a model that can't be fit.

Whether you need to take that variation among individuals into account in precisely the way you write is a different story. If all you need to do is to account for intra-individual correlations in the observations you might not need a mixed model at all. Frank Harrell makes a case for generalized least squares in Chapter 7 of his course notes and book, where he outlines relative advantages of several approaches to longitudinal data analysis beyond mixed models.

In the realm of mixed models, just what should be considered "fixed" versus "random" effects and whether the same predictor should be included in both ways is not an easy question. Ben Bolker's GLMM FAQ page discusses these issues and has links to further discussion. In particular with respect to including a predictor both ways, this linked page notes: "Much will depend on the nature of the variable" and illustrates with examples.

Even in a mixed model, there are alternate ways to specify interactions. Douglas Bates has a presentation on interactions in mixed models. He distinguishes between vector-valued random effects of the type that you write and scalar interaction terms:

Different ways of expressing such interactions lead to different numbers of random effects. These different definitions have different levels of complexity, affecting both their expressive power and the ability to estimate all the parameters in the model.

So the answer depends on the goals of your study (for example, whether you want explicitly to model particular random effects or are just trying to account for intra-individual correlations), the nature of the data, and the way that you wish to model the structure of the random effects.

Thanks for your input. I'm not asking about a specific study, though I have an example in mind to write the question, but this comes up a lot and always confuses me. So the question is general, but not like "is it always appropriate" but instead "is it ever appropriate and if so when". Maybe there's nothing "special" about this situation with a between-subjects design when the effect of interest is an interaction, but that's how a lot of work in my subfield is and it always feels like it should be treated differently to me. — emily, Jan 31 '22 at 15:43
@emily I've expanded on the answer, with some links that might be helpful. — EdM, Feb 01 '22 at 15:55

Is it appropriate to include all factors as by-subject random slopes in a between subjects design?

1 Answers1