0

I have a large database with repeated measures of Y at various times. Y is continuous, and I know that its evolution is usually modified by numerous baseline confounders.

I am trying to fit a mixed model. Here is a little reprex in R. I hope that the question won't be too much R-driven.

My actual dataset is much larger (300k+ lines in long) and confounders can be continuous or categorical, but the spirit is the same. In this example, let's say X1 is the variable I hypothesize is influencing the evolution of Y over time, and X2 is a confounder.

library(dplyr)
library(lme4)

df.long = data.frame(
  id=c(rep(c("A", "B", "C"), each=5)),
  time=c(1,2,3,4,5,10,11,12,13,14,5,6,7,8,9),
  y=c(25,32,35,37,40,55,51,59,57,60,10,15,20,30,45)
)    
df.baseline = data.frame(
  id=c("A", "B", "C"),
  x1=c(98, 42, 23),
  x2=c(250,390,527)
)

df = df.long %>% left_join(df.baseline, by="id")
df
#    id time  y x1  x2
# 1   A    1 25 98 250
# 2   A    2 32 98 250
# 3   A    4 35 98 250
# 4   A    7 37 98 250
# 5   A    8 40 98 250
# 6   B   10 55 42 390
# 7   B   12 51 42 390
# 8   B   16 59 42 390
#...

I've seen a whole lot of ressources, but very few talks about specific times and even less about adjusting for confounding variables.

In my notebook, there is a small paragraph about the time, saying you should set a specific covariance matrix Toeplitz-like since time3 is more associated to time2 than to time1.

But since I don't have time1 and time3, but a huge set of different times, with unequal delta between them, how can I build such a covariance matrix? And then how can I apply it to my model ?

Also, how should I put confounders in my model? Some talks about interaction on time (time*x1*x2), but with many confounders it makes little sense to me.

For the record, the best model I've made up so far (based on this answer) is :

lmeModel = lmer(y ~ time + x1 + x2 + (1+ time|id), data=df)

For what I understand, it has random effect on id and should account for the effect of time for each id. But it is not clear if I should keep time as standalone, neither as if the covariance matrix is OK.

DISCLAIMER: this may be a confusing question, but please help me to improve it if you think so.

Dan Chaltiel
  • 1,089
  • 12
  • 25

1 Answers1

2

A couple of points:

  • A general model-building strategy for mixed models is that you start with an elaborate/flexible specification of the fixed-effects part, including nonlinear and interactions terms. Then keeping this elaborate fixed-effects structure you build your random-effects part. Typically, you start with a random intercept (that corresponds to constant correlation over time), and you move to random slopes and possibly also higher-order terms, such as nonlinear random slopes. Also, depending on the features of your study it may be required to included random effects for other grouping factors (e.g., in a multilevel design). After you selected your random effects, you can return to the fixed-effects part and try to simplify your model, starting from seeing if you need the complex nonlinear/interaction terms. For more on this, you can have a look at Sections 3.1, 3.2 (also useful 2.4) and 3.9 of my course notes.

  • When you include random slopes in your random-effects part, you indeed assume that measurements that are closer in time are more strongly correlated than measurements that are further apart. To get a better intuition on how random effects capture correlations, you can have a look at Section 3.3 of my shiny app for my course mentioned above.

  • Indeed including three-way interactions of time with x1 and x2 can make interpretation difficult. Typically, I only consider interactions of time with other variables for longitudinal data. See again Section 2.4 and 3.2.

Dimitris Rizopoulos
  • 17,519
  • 2
  • 16
  • 37