Nested linear mixed modeling including a factor applicable to one subject group only

Question

I am trying to build a nested linear mixed model (LMM) where 2 subject groups are compared, each with a set of measurements (time series). The outcome measure (dependent variable) is speech understanding.

One subject group is a patient group wearing an auditory prosthetic (a cochlear implant, or CI), the other a control group (normal hearing, or NH). A total of 32 subjects are included.

For the CI group we know that the time they used their CI plays a role in their outcome. We wish to take this factor into account in the LMM.

The problem: time of CI use cannot be used as a fixed effects co-variate because it doesn't apply to the control group, since they don't have an implant. Labeling them as '0 years' would also not be OK, because that would imply they have a CI, but haven't used them.

So how should I include time-of-device-use in the LMM? I tried fumbling around with the addition of random variables besides the 'subject ID' but that resulted in error messages in SPSS, as it can't seem to build the model anymore with more than 1 random factor (possibly because of the small subject population).

In short: can I add a factor into a nested LMM that applies to one group of subjects only, and if yes, how? I'm using SPSS and I am not proficient with R, unfortunately.

One solution is here: https://stats.stackexchange.com/questions/372257/how-do-you-deal-with-nested-variables-in-a-regression-model/372258#372258 — kjetil b halvorsen, May 16 '21 at 19:47
@kjetilbhalvorsen (+1) Good one. In fairness, most these options are hard to justify in the current situation but they are definitely something to the OP to see, thanks! — usεr11852, May 16 '21 at 21:36

score 2 · Accepted Answer · answered May 15 '21 at 16:50

This is a very interesting question. Simply put using a standard L(M)M design we cannot account for a factor that is applicable to one group of subject only. As you correctly identified, setting the control group as having "0 years" is misleading.

I can think of only one "quick" solution which is not perfect in itself:

Use "year-of-CI-used" outcome instead of "raw" outcome. We will define a new outcome variable that is adjusted for how many years our treatment subjects have been using their CI. That way we go forward without having to include it as a fixed effect covariate in our LMM. We would estimate what would be the outcome of each of our 16 treatment subject after say 5 years of using their CI and then take that outcome as our response variable in our LMM. ie. our new outcome variable would be $E\{y|x_{years} = 5 \}$ for our 16 treatment subjects. Such, two-stage regression approaches have been pretty common especially early on; indeed some of the early influential work on the matter took this two-stage approaches directly (e.g. see Baron & Kenny (1986) The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical consideration).

In general, I think the issue described is related to mediation analysis. I would suggest investing the time to read through a "general introduction" to the subject like VanderWeele (2016) Mediation Analysis: A Practitioner's Guide or a similar text discussing causal mediation methods. The scenario outlined in the question unfortunately does not lent itself to standard approaches because as mentioned, the mediation variable is nonsensical for the control group. Rudolph et al. (2018) Causal mediation analysis with observational data: considerations and illustration examining mechanisms linking neighborhood poverty to adolescent substance use has a nice introduction on the matter too.

Thanks! +1 for the well-referenced answer, beauty. I'd like to ask for a clarification if you please. Am I correct that you mean I should follow the following procedure to calculate the *year-of-CI-used* adjusted dummy outcome: (1) run an LMM first on the CI group only and look what the slope of the covariate is; (2) fill in 5 years for every subject and correct the outcome measure for slope*time. (3) Basically what you would do is thus removing the variation posed by CI use by setting them all at 1 time point? ... — AliceD, May 16 '21 at 18:55
Wouldn't it make more sense to set it at 0 years to remove the effect altogether? Or possibly even better, setting it at the average time of device use? — AliceD, May 16 '21 at 18:57
Yes, you got it right. And yes, the average time of device use is probably the best choice. I used 5 years as an example; I would also pick the average or median time of device use in reality. — usεr11852, May 16 '21 at 21:29

Nested linear mixed modeling including a factor applicable to one subject group only

1 Answers1