I have a large database with repeated measures of Y
at various times. Y
is continuous, and I know that its evolution is usually modified by numerous baseline confounders.
I am trying to fit a mixed model. Here is a little reprex in R. I hope that the question won't be too much R-driven.
My actual dataset is much larger (300k+ lines in long) and confounders can be continuous or categorical, but the spirit is the same. In this example, let's say X1
is the variable I hypothesize is influencing the evolution of Y
over time, and X2
is a confounder.
library(dplyr)
library(lme4)
df.long = data.frame(
id=c(rep(c("A", "B", "C"), each=5)),
time=c(1,2,3,4,5,10,11,12,13,14,5,6,7,8,9),
y=c(25,32,35,37,40,55,51,59,57,60,10,15,20,30,45)
)
df.baseline = data.frame(
id=c("A", "B", "C"),
x1=c(98, 42, 23),
x2=c(250,390,527)
)
df = df.long %>% left_join(df.baseline, by="id")
df
# id time y x1 x2
# 1 A 1 25 98 250
# 2 A 2 32 98 250
# 3 A 4 35 98 250
# 4 A 7 37 98 250
# 5 A 8 40 98 250
# 6 B 10 55 42 390
# 7 B 12 51 42 390
# 8 B 16 59 42 390
#...
I've seen a whole lot of ressources, but very few talks about specific times and even less about adjusting for confounding variables.
In my notebook, there is a small paragraph about the time, saying you should set a specific covariance matrix Toeplitz-like since time3
is more associated to time2
than to time1
.
But since I don't have time1
and time3
, but a huge set of different times, with unequal delta between them, how can I build such a covariance matrix? And then how can I apply it to my model ?
Also, how should I put confounders in my model? Some talks about interaction on time (time*x1*x2
), but with many confounders it makes little sense to me.
For the record, the best model I've made up so far (based on this answer) is :
lmeModel = lmer(y ~ time + x1 + x2 + (1+ time|id), data=df)
For what I understand, it has random effect on id
and should account for the effect of time
for each id
. But it is not clear if I should keep time
as standalone, neither as if the covariance matrix is OK.
DISCLAIMER: this may be a confusing question, but please help me to improve it if you think so.