Baseline adjustment in growth models: Random Intercept or Baseline Covariate

Question

Let's say I have outcome data at four time-points (baseline, 3 months, 6 months, 12 months) which I want to regress on an explicit time variable ($t_1 = 0$, $t_2 = 1$, $t_3 = 2$, $t_4 = 3$) to understand linear change.

I typically adjust for baseline differences in the outcome using a random intercept, e.g.:

$$Y_{it} = \beta_0 + \beta_1Time_{it} + U_i + e_{it} $$

Where $i$ = subject, $t$ = time, $B_0$ is a fixed intercept, $B_1$ is the slope of the explicit time variable, $U_i$ is the random intercept, and $e$ is subject- and time-varying error.

However, my supervisor adjusts for baseline differences by including the baseline measurement as a covariate and a random intercept, e.g.,:

$$Y_{it} = \beta_0 + \beta_1Time_{it} + \beta_2Baseline_i + U_i + e_{it} $$

I know that other people adjust for baseline variation in the outcome by just including baseline measurement as a covariate and no random intercept.

My questions are:

Which of the above approaches is valid for adjusting for baseline differences (if any) and why?
In particular, is it appropriate to adjust for baseline variation with a random intercept and no baseline covariate, and why?
Do you have any references on the topic?

score 5 · Accepted Answer · answered Jan 05 '18 at 12:00

5

The random effects (here, intercepts) do not adjust for baseline; they just allow each subject to be vertically shifted by their own customized amount. You'll get more outcome variation explained (and much lower residual variance) when adjusting for baseline in addition. Random effects handle intra-subject correlation. This can also be handled by modeling the correlation structure explicitly using generalized least squares (which used to be called growth curve models). I like structures such as AR1 for this purpose.

Note that the baseline distribution often has a different shape than the distribution of the measurements at follow-up, making it hard to model baseline as an outcome but easy to condition on it as a covariate.

answered Jan 05 '18 at 12:00

Frank Harrell

74,029
5
148
322

Thanks for taking the time to answer this question, Frank. Even though random intercepts don't explicitly adjust for individual differences at baseline, do they not still include this variation in the model (via vertical shifting)? Is this not a valid way for accounting for individual differences in baseline values, even if implicit? Most of the mixed-effects growth models I've read about don't usually include baseline scores as a covariate (e.g., Gibbons, R. D., Hedeker, D., & DuToit, S. (2010). Advances in analysis of longitudinal data. Annual review of clinical psychology, 6, 79-107). – PyjamaNinja Jan 05 '18 at 15:37
1

I would say that random intercepts capture variations in overall tendencies that are not informed by known, measured differences between subjects at baseline. Adjusting for baseline explains far, far more of the variation in Y than the random intercepts do in many cases. – Frank Harrell Jan 08 '18 at 14:00
Brilliant, thank you. There is, of course, the option of correlating the random intercept with a random slope if one estimates them both. I expect that this would do a better job of adjusting for baseline variation than just including a random intercept, albeit not to the strength of adding it as a covariate. A plus is that one can still include the baseline measurement in the growth curve, whereas one would need to remove it with the ANCOVA-like approach - which may not be possible if one has very time-points to start. – PyjamaNinja Jan 08 '18 at 18:29
I wouldn't spend this amount of effort if you have a powerful baseline covariate to include in the model. But you may learn something from it. – Frank Harrell Jan 09 '18 at 21:30
I'm not following the points about having to remove a baseline variable with ANCOVA, or the number of time points to start with. – Frank Harrell Jan 10 '18 at 11:15
I guess what I meant was, if one only has a few time-points, it may not be optimal to sacrifice one of them as a covariate, especially if the growth curve requires identification (and hence at least 4 time-points). I am, of course, assuming that that using baseline measurement as a covariate means that it must be removed from the growth curve. A similar point was made here: https://goo.gl/mM4M62 – PyjamaNinja Jan 13 '18 at 21:03
1

There may be a few exceptions but in general the amount of variance explained by the baseline variable is more than worth not having it as an outcome. Plus in many studies the baseline variable is used to screen subjects for entry into the study, creating a truncated distribution for the baseline that disallows it from being modeled multivariately with the other time points. – Frank Harrell Jan 15 '18 at 02:21

Baseline adjustment in growth models: Random Intercept or Baseline Covariate

1 Answers1