Using R and lme4 package, I've fitted the following model:
fit <- lmer(lung_capacity ~ calendar_year*treatment_group + age
+ sex + smoking_status + (1 | calendar_year:ID))
I intend to examine trends in how lung capacity evolves over the years in various treatment groups (4 groups defined). The same persons are examined repeatedly over the years, in a highly unbalanced fashion. I use lsmeans package to estimate the trends, using the following call:
trends <- lsmeans(fit, ~ treatment_group | calendar_year)
Variable explanation: lung_capacity = continuous variable, describing the patients functional lung capacity. ID = patient identifier (this is a repeated measurements scenario). calendar_year = factor variable, consisting of all years between 1990 and 2008. Remaining variables are self-explanatory.
I've chosen year as a random effect despite the fact that many experts consider year not be random. However, within the year variable lies a great deal of variance and management of these patients changed significantly over the years.
Viewing crude data shows that age, sex, smoking status (i.e things I want to adjust for) differed only little over the years. When I fit the model without the interaction between calendar year and ID, I obtain implausible results (judged by examining the deviation from crude values and subject matter knowledge).
The model I fitted, regarding the random effects, as I understand it (link), fits fixed effects for calendar year plus random variation in intercept among patents within calendar years.
Questions: What if I only have 1 observation for some patients a specific year?** For each year roughly 80% of patients have 2 or more observations, and the other 20% have only 1 observation that specific year.
I have not attached a reproducible example because I'm looking for a discussion on the concepts.
Cheers!