9

I am trying to deal with a time-to-event analysis using repeated binary outcomes. Suppose that time-to-event is measured in days but for the moment we discretize time to weeks. I want to approximate a Kaplan-Meier estimator (but allow for covariates) using repeated binary outcomes. This will seem like a roundabout way to go but I'm exploring how this extends to ordinal outcomes and recurrent events.

If you create a binary sequence that looks like 000 for someone censored at 3 weeks, 0000 for someone censored at 4w, and 0000111111111111.... for a subject who failed at 5w (the 1s extend to the point at which the last subject was followed in the study), when you compute week-specific proportions of 1s you can get ordinary cumulative incidences (until you get to variable censoring times, where this only approximates but doesn't equal Kaplan-Meier cumulative incidence estimates).

I can fit the repeated binary observations with a binary logistic model using GEE, instead of making time discrete as above but instead using a spline in time. The cluster sandwich covariance estimator works reasonably well. But I'd like to get more exact inference by using a mixed effects model. The problem is the that the 1's after the first 1 are redundant. Does anyone know of a way to specify random effects or to specify a model that takes the redundancies into account so that standard errors will not be deflated?

Note that this setup differs from Efron's because he was using logistic models to estimate conditional probabilities in risk sets. I'm estimating unconditional probabilities.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322

3 Answers3

3

As far as I can see with both the GEE or a mixed model for repeated binary observations, you will have the problem that the model will assign a positive probability for a ‘0’ after the first ‘1’ has been observed.

In any case, given that you want to get estimates from a mixed effects logistic regression that will have the same interpretation as in the GEE (see here for more info), you could fit the model using the mixed_model() function from the GLMMadaptive package, and then use marginal_coefs(). For an example, see here.

Dimitris Rizopoulos
  • 17,519
  • 2
  • 16
  • 37
  • 1
    Thanks Dimitris. For my case with redundant 1's (to get the mean function right) I think I'll need a modified model or a strange random effects setup. The `GLMMadaptive` package looks terrific for the more general setup. – Frank Harrell Aug 26 '19 at 18:00
2

Couple of thoughts about this:

  1. It seems that a mixed-effect model is fundamentally a 'conditional' probability model, i.e., what is the probability of an event for a subject that is at risk for that event.

  2. We know the probability of a '1' after the first '1' is one. Thus, there is no additional information in the subsequent '1' values.

  3. It seems that, because subsequent '1' values contain no additional information, they should have no impact on the likelihood function, and thus have no impact on standard errors of likelihood-based estimators, nor the estimates themselves. Indeed, there would be no impact of subsequent '1' values if p(y='1'|x)=1 regardless of model parameter values, as it should be.

  4. We might be able to force this behavior (i.e., p(y='1'|x)=1), and retain the desired mean function, by adding an indicator covariate to the model that marks subsequent ones, and by forcing its coefficient to be very large so that effectively p(y='1'|x)=1.

  5. As you mentioned, there may also be a way to force the first '1' and subsequent responses to have 100% correlation. But in a binomial model, that is the same as p(y='1'|x)=1 for subsequent responses.

  • 1
    Thanks Matt. If I wasn't wanting a full model but was content with estimating equations, what you are getting at is adding the duplicate responses to the score function to get the mean function right, but not adding them to the information function. I don't think I can add an indicator covariate, because that would take about from, e.g., the treatment effect. I think of the mixed effect model as more of an unconditional model. When the event is not an absorbing state, you're modeling marginal effects in a time-dependent way. – Frank Harrell Aug 29 '19 at 22:30
1

I'm not exactly sure what you're trying to do, but can you fit a pooled logistic regression model (https://www.ncbi.nlm.nih.gov/pubmed/2281238)? In this case you would only include 1 during the interval of the terminal event -- it would not repeat after the event has occurred. You would include time in the model in a flexible manner (e.g., expanded using splines).

  • 1
    Hey Bryan - I really like pooled logistic regression and have used it often. But if you terminate a subject's observations at the terminal event, and have other subjects followed beyond that point without an event, you'll get the mean function (P(event by time t)) wrong. I want to get near-Kaplan-Meier cumulative incidence estimates for the mean function at least in special cases. – Frank Harrell Aug 29 '19 at 22:34