1

Using R and lme4 package, I've fitted the following model:

fit <- lmer(lung_capacity ~ calendar_year*treatment_group + age
            + sex + smoking_status + (1 | calendar_year:ID))

I intend to examine trends in how lung capacity evolves over the years in various treatment groups (4 groups defined). The same persons are examined repeatedly over the years, in a highly unbalanced fashion. I use lsmeans package to estimate the trends, using the following call:

trends <- lsmeans(fit, ~ treatment_group | calendar_year)

Variable explanation: lung_capacity = continuous variable, describing the patients functional lung capacity. ID = patient identifier (this is a repeated measurements scenario). calendar_year = factor variable, consisting of all years between 1990 and 2008. Remaining variables are self-explanatory.

I've chosen year as a random effect despite the fact that many experts consider year not be random. However, within the year variable lies a great deal of variance and management of these patients changed significantly over the years.

Viewing crude data shows that age, sex, smoking status (i.e things I want to adjust for) differed only little over the years. When I fit the model without the interaction between calendar year and ID, I obtain implausible results (judged by examining the deviation from crude values and subject matter knowledge).

The model I fitted, regarding the random effects, as I understand it (link), fits fixed effects for calendar year plus random variation in intercept among patents within calendar years.


Questions: What if I only have 1 observation for some patients a specific year?** For each year roughly 80% of patients have 2 or more observations, and the other 20% have only 1 observation that specific year.

I have not attached a reproducible example because I'm looking for a discussion on the concepts.

Cheers!

Cliff AB
  • 17,741
  • 1
  • 39
  • 84
  • Welcome to CV. The concern with having only one observation in a year is an issue for frequentists but not for Bayesians. As noted in chap 13 of Gelman and Hill's book *Data Analysis Using Regression...*, in a Bayesian framework models are estimable based on a single observation since the analysis is focused on the posterior and Bayesian posteriors can be well populated. – Mike Hunter Nov 14 '15 at 12:55
  • Thanks for your reply @DJohnson. Unfortunately I'm not that comfy with Bayesian strategies, and I don't quiet understand how I should interpret your comment. Would you say it is okey, despite having only 1 observation per year for some individuals? Would it necessitate any clarification regarding the methods I have chosen? I'm grateful for any help =) – David Schreibmuller Nov 15 '15 at 23:32
  • No clarification is needed. Bayesian approaches aren't for everyone. See this thread for related comments ... http://stats.stackexchange.com/questions/181842/comparing-pre-and-post-test-raw-scores-with-one-subject#comment345204_181842 – Mike Hunter Nov 15 '15 at 23:51
  • From what you describe, it probably is not going to be a problem having only one observation for some patients in some years. I think you'll get an error message if it is. On the other side of the coin, I don't think you can make a silk purse out of a sow's ear, regardless of whether your needle and thread are frequentist or Bayesian. True, the Bayesian method won't break as easily, but with little data, your results are largely a reflection of the prior that is used. – Russ Lenth Nov 16 '15 at 21:48

0 Answers0