Logistic regression model for hospital readmissions: accounting for multiple admissions

Question

I am evaluating the impact of a post-discharge program on readmissions at the hospital where I work. I'm using logistic regression to do this where each patient's hospital admission within a two-year time frame is coded 1/0 for whether or not a readmission occurred. Variables include mostly patient factors surrounding readmission risk (comorbidities, gender, lab values, etc).

My question is around how to handle patients who have multiple admissions in the time frame. I'm worried that if I include multiple admissions per patient, the data from these patients will bias the estimation of the parameters towards frequent users versus patients with fewer admissions or only one admission. I've read published studies that are similar to what I'm doing and most seem to handle this issue by keeping only a single "anchor" admission for each patient, dropping the others.

I'm worried that if I follow this approach, I'll lose too much data. As is, I work for a small/medium-sized hospital and I've got about 2,600 admissions to work with. If I keep only, say, each patient's earliest admission in the time period, I lose about 600 observations. I'm also concerned that if I only keep one admission per person, my model won't reflect the real world where people do, in fact, have multiple admissions.

My thought is to keep all 2,600 observations and estimate two models. The first model would use the glm command to estimate a plain vanilla fixed effects logistic regression. The second model would use the glmer command from lme4 to estimate a mixed model with a random intercept for patient id. I'd take a similar approach as is done here. I can compare models using AIC. No difference, I'd use the fixed effects regression since the interpretation is a bit more straightforward.

I'm interested in thoughts on this approach and whether my concerns present a valid problem for inference or show a misunderstanding of certain concepts.

Do you have information on the times between discharges and readmissions for each individual? — EdM, Jan 26 '22 at 04:11

score 7 · Accepted Answer · answered Jan 26 '22 at 15:54

If you have actual discharge and readmission dates for each patient, then this might best be handled with repeated-event survival analysis. That's presented in the main R survival vignette and the multi-state vignette.

Such models directly deal with the within-patient correlations that rightly concern you. You can choose a robust variance estimate similar to those of generalized estimating equations by specifying a cluster() term for the individual IDs, or a frailty() term to handle the correlations with a simple type of random effect. There's also a coxme package for more complicated mixed-model designs.

In addition, survival analysis will provide information about the time to readmission, not just the fact of readmission. That's important if you are particularly interested in things like readmission within 30 days. Survival analysis will also handle the fact that a patient's readmission might occur after your observation period while taking into account the information you get from that patient while still under observation.

The question then will be how to choose the time = 0 reference for survival. It seems that you should reset the clock to 0 at each discharge time to estimate the time to the subsequent readmission. You could then specify covariate values in place at each individual discharge for the patient. More detailed modeling with a multi-state model (e.g., states of: in hospital, out of hospital, death) might be considered.

It's a good and valuable suggestion - I've not had any experience with survival analyses and will definitely take some time to learn. Currently, I'm working under a deadline and don't have the necessary time to learn before the results are due. I'm also constrained by my employer's specific query; the effect of the program on probability of 30-day readmission. Given that binary logistic regression is the estimator of choice, any thoughts on my particular approach? — Dan, Jan 26 '22 at 18:06
@Dan Use logistic regression but take intra-patient correlations into account as I suggest for survival: robust errors from GEE models via [`geepack`](https://cran.r-project.org/package=geepack) or a mixed model with patients as random effects via `glmer()` as you propose. I don't see a need for a separate model without random effects. The _interpretation_ of fixed-effect coefficients is the same in both; theoretical disputes about [details of p-value calculations](https://stats.stackexchange.com/q/22988/28500) shouldn't affect any business decisions based on the results. — EdM, Jan 26 '22 at 18:42
@Dan the [`Zelig` package](https://cran.r-project.org/package=Zelig) might be useful here; see [this page](http://docs.zeligproject.org/articles/zelig_logitgee.html) for an example of using it to handle correlations in logistic regression via generalized estimating equations (GEE). That said, it would be better if you could get the time to implement survival analysis or the discrete-time state-transition model that Frank Harrell recommends in a comment on another answer. — EdM, Jan 26 '22 at 18:55

score 1 · Answer 2 · answered Jan 26 '22 at 12:09

1

If your response (number of re-admissions) ranges for example from 0 to 4 or so, you could use ordinal logistic regression. However, the maximum number of re-admissions in your model shouldn't be too high (maybe 3 or 4). Otherwise you could for example combine all cases with 4 or more re-admissions to a new category like 4+ or similar. However, I find the interpretation of the coefficients (at least using polr function from MASS-package in R) a bit tedious.

answered Jan 26 '22 at 12:09

RomanS

46
6

My response is a binary outcome: readmitted or not. – Dan Jan 26 '22 at 15:00
2

@Dan I think that the suggestion here is to model the total number of re-admissions rather than just a yes/no about readmission. For that an ordinal logistic regression could make sense. – EdM Jan 26 '22 at 15:33
1

Both ordinal regression and recurrent time-to-event analysis make a lot of sense here. Sometimes for recurrent event analysis it is simpler and more interpretable to model current status (as a function of time) using a discrete time state transition model. This is a form of longitudinal binary logistic model using for example a Markov process. This yields very interpretable things such as expected number of admissions in x years, expected time in hospital, etc. More here: https://hbiostat.org/proj/covid19 – Frank Harrell Jan 26 '22 at 17:04
An ordinal is a nice idea and an approach I'm familiar with. I'm not sure that modeling a count of readmissions is as important to my employer as likelihood of being readmitted after any particular admission within a 30-day time frame. Most patients with readmissions have just 1 within 30 days of discharge It might also be difficult, from a validity standpoint, to build a model that predicts admissions well down the road, given information about the current state. – Dan Jan 26 '22 at 18:11

score 0 · Answer 3 · answered Jan 26 '22 at 20:08

0

I think it depends what you actually want to improve:

total number of readmissions
number of patients having no readmission for 12 months
etc

Your model should correspond to the actual objective you want to achieve.

answered Jan 26 '22 at 20:08

chrishmorris

820
5
5

1

This should be moved to a comment. – Frank Harrell Jan 26 '22 at 20:23

Logistic regression model for hospital readmissions: accounting for multiple admissions

3 Answers3