I realized my original post was really two questions, so I'm splitting it up. My other post describes the same problem, and asks whether using weights in a GLM would be appropriate.
My data include a baseline with up to four follow-up observations (but with lots of missing follow-ups). I'm interested in predicting whether a particular binary outcome is present at any follow-up point. Beyond this, though, I don't care at how many follow-ups it's observed, and don't want this taken into account by the model. In other words, if subject A says "yes" at follow-up 2 and "no" at follow-ups 1, 3, and 4, while subject B says "yes" at all 4 follow-ups, these should both just be considered "present" and "count the same" in the model.
Given the data structure, mixed effects modeling seems like a natural fit, and I attempted a logistic mixed effects model with random intercepts for subject:
library(lme4)
out <- glmer(dv ~ predictor + follow_up + (1 | ID), family = binomial, data = data)
But this model will assign more likelihood to cases where dv == 1 across 3 observations, say, relative to those where dv == 1 only once, which isn't what I want. In line with this, predicted probabilities from this model (using emmeans
) are much lower than observed rates for a "dv ever present" variable.
Is there another way to specify a GLMM, or structure the data, to model whether the DV was "ever" observed? Or is there a better approach to this problem?