Discrete-Time Event History (Survival) Model in R

Question

I'm trying to fit a discrete-time model in R, but I'm not sure how to do it.

I've read that you can organize the dependent variable in different rows, one for each time-observation, and the use the glm function with a logit or cloglog link. In this sense, I have three columns: ID, Event (1 or 0, in each time-obs) and Time Elapsed (since the beginning of the observation), plus the other covariates.

How do I write the code to fit the model? Which is the dependent variable? I guess I could use Event as the dependent variable, and include the Time Elapsed in the covariates. But what happens with the ID? Do I need it?

Thanks.

When you say "I'm trying to fit a discrete time model" ... what model do you want to fit? (If this is for some subject, please add the `self-study` tag.) — Glen_b, Apr 25 '13 at 10:03
It seems unlikely that ID is relevant, but it depends on what, exactly it represents and whether that's something you want to model. — Glen_b, Apr 25 '13 at 13:18

ndoogan · Answer 1 · 2014-02-26T03:27:26.993

You're basically right about data organization. If you have cases organized like this:

ID M1 M2 M3 EVENT

You will likely want to reorganize the data so that it looks like this:

ID TIME EVENT
1  1    0
1  2    1
1  3    1
2  1    0
2  2    0
.  .    .
.  .    .

I call this a conversion from a wide format to a long format. It is done easily in R using the reshape() function or even more easily with the reshape2 package.

I personally would keep the ID field for its potential use in identifying a source of variation in a mixed effects model. But this is not necessary (as pointed out by @BerndWeiss). The following assumes you would want to do so. If not, fit a similar model with glm(...,family=binomial) without the random effect terms.

The lme4 package in R will fit a mixed effects logistic regression model similar to the one you're talking about, except with a random effect or two to account for variability in the coefficients across subjects (ID). The following would be example code for fitting an example model if your data are stored in a data frame called df.

require(lme4)
ans <- glmer(EVENT ~ TIME + (1+TIME|ID), data=df, family=binomial)

This particular model allows the TIME and the intercept coefficients to vary randomly across ID. In other words, this is a hierarchical linear mixed model of measurements nested in individuals.

An alternate form of a discrete time event history model breaks TIME into discrete dummies and fits each as a parameter. This is essentially the discrete case of the Cox PH model because the hazard curve is not restricted to being linear (or quadratic, or however you can imagine transforming time). Although, you may wish to group TIME into a manageable set (i.e. small) of discrete time periods if there are a lot of them.

Further alternates involve transforming time to get your hazard curve right. The previous method basically alleviates you from having to do this, but the previous method is less parsimonious than this (and the original linear case I posed) because you may have a lot of time points and thus, a lot of nuisance parameters.

An excellent reference on this topic is Judith Singer's and John Willet's Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence.

You do not need a "mixed effects logistic regression model" to estimate a simple discrete-time model (Fiona Steel has published a few articles on "[Multilevel discrete-time event history analysis](http://www.bris.ac.uk/education/people/fiona-a-steele/pub/2781928)"). Do you have a reference? Re the data preparation step, I also suggest to have a look at the [survSplit](http://stat.ethz.ch/R-manual/R-patched/library/survival/html/survSplit.html) function. — Bernd Weiss, Apr 25 '13 at 12:02

Bernd Weiss · Answer 2 · 2013-04-25T11:57:05.110

Singer and Willett have been published a lot on this subject. I highly recommend that you read some of their papers. You also might want to get their book "Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence". Clearly one of the best textbooks in this field.

For most book chapters there is R sample code (see chapters 11ff) available that demonstrates how your data has to be structured ("person-period format") and how to analyze that kind of data. For a standard discrete-time model you do not need the ID variable and you also do not need to estimate a mixed-effects model as suggested by @ndoogan. A simple glm(event ~ time + ..., family = "binomial") works just fine. Singer and Willett also discuss many issues how to model the time variable (linear, quadratic, ...)

To cite two more references that I highly recommend:

Allison (1982): "Discrete-Time Methods for the Analysis of Event Histories" (PDF) (the Allison article also discusses why you can use a standard glm instead of a mixed-effects model)
Mills (2011): "Introducing Survival and Event History Analysis"

score 1 · Answer 3 · answered Oct 07 '17 at 13:53

You can break time time into intervals and perform a multiperiode logit model as in Shumway (2001). E.g., you time intervals are $(0, 1], (1, 2], \dots$. I have implemented this in dynamichazard::static_glm in R which is directly applicable if you have initial data in a typical stop-event setup used in survival analysis. Do notice that the t-stats from the resulting model does not have the correction mentioned in Shumway (2001).

This method differs from the one @ndoogan with time dummies as you only get one common intercept in all time periods with dynamichazard::static_glm. You can, however, get a dummy for each period by calling dynamichazard::get_survival_case_weights_and_data with argument use_weights = FALSE, add the time dummy indicator yourself to the returned data.frame and then call e.g. glm.

Further, you may be interested in [this vignette](https://cran.r-project.org/web/packages/dynamichazard/vignettes/Comparing_methods_for_logistic_models.pdf) in my package `dynamichazard`. — Benjamin Christoffersen, Oct 25 '17 at 21:40

score 0 · Answer 4 · answered Aug 14 '18 at 11:05

This is called "counting process" data. Survival package has a very nice tmerge() function. It's very useful to insert time dependent or cumulative covariates and partition follow-up time accordingly. The process is very well explained in this vignette

Discrete-Time Event History (Survival) Model in R

4 Answers4

Linked