1

I realized my original post was really two questions, so I'm splitting it up. My other post describes the same problem, and asks whether it can solved using a mixed effects model.

My data include a baseline with up to four follow-up observations (but with lots of missing follow-ups). I'm interested in predicting whether a particular binary outcome is present at any follow-up point. Beyond this, though, I don't care at how many follow-ups it's observed, and don't want this taken into account by the model. In other words, if subject A says "yes" at follow-up 2 and "no" at follow-ups 1, 3, and 4, while subject B says "yes" at all 4 follow-ups, these should both just be considered "present" and "count the same" in the model.

It's easy enough to create an aggregate "DV ever observed" variable, but constructing an appropriate model is tripping me up. For instance, just throwing it into a GLM is problematic, because this won't account for missing data -- in particular, the fact that DV is more likely to "ever" be observed in subjects with more available follow-ups. One solution to this might be to include the number of follow-ups per subject as weights:

library(tidyverse)

data_collapsed <- data %>%
  group_by(ID, predictor) %>%
  summarize(
    dv_ever = max(dv),
    n_fu = n()
  ) %>%
  ungroup()

out <- glm(dv_ever ~ predictor, family = binomial, weights = n_fu, data = data_collapsed)

But further reading has made me worry that this isn't an appropriate use of weights, especially for binary regression (relevant CV answer; R-help thread). My understanding is the weights argument is intended for use when the dv is an aggregate probability across multiple binary trials -- essentially the mean of the trials, whereas my "ever observed" aggregate is equivalent to the max of the trials.

So, is this an appropriate use of weights, and an appropriate approach to my problem? If not, what might be? What about simply including number of follow-ups as a covariate?

zephryl
  • 143
  • 9

0 Answers0