8

I don't know how to generate time dependent covariates in R for use cox regression.

I know you need to reorganize your dataset into intervals between event times. This I believe I can do with the tutorials floating around. After that, though I am stuck. Now what? For each covariate I now need to calculate its value at each particular time interval? How would I do that? Do I need to go back to the database and grab the date that say, a pulse rate was recorded, and then update the pulse value accordingly based on the date?

I just want to confirm that when doing time dependent covariates, you often have to go back and extract more date/time information from the database and update all the covariate information. Basically, the tutorials that give transform your dataset into the "long" format is not the only thing that I need to do, right?

user798719
  • 271
  • 1
  • 3
  • 9

1 Answers1

9

Its hard without seeing your data, so I'll try making it generic. First of all, the two main ways that a data frame should look like for the use in the survival package:

The bare-bones

  • ID - a unique variable to identify each unit of analysis (e.g., patient, country, organization)
  • Event - a binary variable to indicate the occurrence of the event tested (e.g., death, , revolution, bankruptcy)
  • Time - Time until event or until information ends (right-censoring). The Cox model is best used with continuous time, but when the study is over the course of years (especially regarding countries) monthly spells can do.
  • (Oftentimes) Some covariates

Lets use a made-up model of trying to find the hazard for countries falling into civil war (the event) over ten years (in monthly spells). Using a single covariate (previousCivilWar) which is not time-dependent:

# the first country was censored before an event and the second 
# experienced the event after 8 years
id time event prevCivilWar 
 1  120     0   0 
 2   96     1   1 

Adding time-dependent covariates: Method 1

  • Covariate - In this case you need to know the original value, and whether it changed and to what - and if so, when (at what spell).
  • Changing the time variable to start and end - when needed to indicate the time of change for (any of the) covariates

Here we will add the binary variable to indicate >40% poverty is40pov:

id time1 time2 event prevCivilWar is40pov
 1    0    80   0               0       0
 1   80   120   0               0       1
 2    0    24   0               1       0
 2   24    60   0               1       1
 2   60    96   1               1       1

When using time-dependent covariates we need to specify the exact time frame until any change in any covariate occurs. Note that the times need to overlap. If a certain subject has no changes in any covariates, than one row suffices.

Method 2 - best for continuously changing covariates This will include $k$ rows per unique ID as there are spells ($k$ rows if censored, or less if the event happened before). So, if you have a database with information on a certain time frame studied, decide on the spell-length that makes sense to you (which makes theoretical sense): If the covariates change on an hourly basis - make it hourly, etc... Once you have decided on the spell-length (e.g., month) and the total time (e.g., ten years) than each ID will have <=$120$ spells.

If you need, create the longitudinal dataset with empty (NA, 0, or whatever) data, for the time-dependent covariates, and make two extra utility columns for dates/times of each spell. Then you can access the database and fetch the specific values for your covariates at those dates/times and fill it in. It is OK if certain rows have no changes in covariates. You will end up with something like:

# The variable pov is the poverty percent of population and measured monthly
id time1 time2 event prevCivilWar pov
 1    0     1   0               0   0.34
 1    1     2   0               0   0.34
 ...
 1   79    80   0               0   0.43
 ...
 1  119   120   1               0   0.41
 2    0     1   0               1   0.25
 ...
 2   23    24   0               1   0.42
 ...
 2   95    96   1               1   0.58 

For more info on time-dependent covariates and coefficients, see Therneau, Crowson and Atkinson's 2016 Vignette.

Yuval Spiegler
  • 1,821
  • 1
  • 15
  • 31