I am evaluating the impact of a post-discharge program on readmissions at the hospital where I work. I'm using logistic regression to do this where each patient's hospital admission within a two-year time frame is coded 1/0 for whether or not a readmission occurred. Variables include mostly patient factors surrounding readmission risk (comorbidities, gender, lab values, etc).
My question is around how to handle patients who have multiple admissions in the time frame. I'm worried that if I include multiple admissions per patient, the data from these patients will bias the estimation of the parameters towards frequent users versus patients with fewer admissions or only one admission. I've read published studies that are similar to what I'm doing and most seem to handle this issue by keeping only a single "anchor" admission for each patient, dropping the others.
I'm worried that if I follow this approach, I'll lose too much data. As is, I work for a small/medium-sized hospital and I've got about 2,600 admissions to work with. If I keep only, say, each patient's earliest admission in the time period, I lose about 600 observations. I'm also concerned that if I only keep one admission per person, my model won't reflect the real world where people do, in fact, have multiple admissions.
My thought is to keep all 2,600 observations and estimate two models. The first model would use the glm
command to estimate a plain vanilla fixed effects logistic regression. The second model would use the glmer
command from lme4
to estimate a mixed model with a random intercept for patient id. I'd take a similar approach as is done here. I can compare models using AIC. No difference, I'd use the fixed effects regression since the interpretation is a bit more straightforward.
I'm interested in thoughts on this approach and whether my concerns present a valid problem for inference or show a misunderstanding of certain concepts.