How to use time dependent covariates with cox regression in R

Question

I don't know how to generate time dependent covariates in R for use cox regression.

I know you need to reorganize your dataset into intervals between event times. This I believe I can do with the tutorials floating around. After that, though I am stuck. Now what? For each covariate I now need to calculate its value at each particular time interval? How would I do that? Do I need to go back to the database and grab the date that say, a pulse rate was recorded, and then update the pulse value accordingly based on the date?

I just want to confirm that when doing time dependent covariates, you often have to go back and extract more date/time information from the database and update all the covariate information. Basically, the tutorials that give transform your dataset into the "long" format is not the only thing that I need to do, right?

score 9 · Answer 1 · answered Dec 09 '16 at 10:37

Its hard without seeing your data, so I'll try making it generic. First of all, the two main ways that a data frame should look like for the use in the survival package:

The bare-bones

ID - a unique variable to identify each unit of analysis (e.g., patient, country, organization)
Event - a binary variable to indicate the occurrence of the event tested (e.g., death, , revolution, bankruptcy)
Time - Time until event or until information ends (right-censoring). The Cox model is best used with continuous time, but when the study is over the course of years (especially regarding countries) monthly spells can do.
(Oftentimes) Some covariates

Lets use a made-up model of trying to find the hazard for countries falling into civil war (the event) over ten years (in monthly spells). Using a single covariate (previousCivilWar) which is not time-dependent:

# the first country was censored before an event and the second 
# experienced the event after 8 years
id time event prevCivilWar 
 1  120     0   0 
 2   96     1   1

Adding time-dependent covariates: Method 1

Covariate - In this case you need to know the original value, and whether it changed and to what - and if so, when (at what spell).
Changing the time variable to start and end - when needed to indicate the time of change for (any of the) covariates

Here we will add the binary variable to indicate >40% poverty is40pov:

id time1 time2 event prevCivilWar is40pov
 1    0    80   0               0       0
 1   80   120   0               0       1
 2    0    24   0               1       0
 2   24    60   0               1       1
 2   60    96   1               1       1

When using time-dependent covariates we need to specify the exact time frame until any change in any covariate occurs. Note that the times need to overlap. If a certain subject has no changes in any covariates, than one row suffices.

Method 2 - best for continuously changing covariates This will include $k$ rows per unique ID as there are spells ($k$ rows if censored, or less if the event happened before). So, if you have a database with information on a certain time frame studied, decide on the spell-length that makes sense to you (which makes theoretical sense): If the covariates change on an hourly basis - make it hourly, etc... Once you have decided on the spell-length (e.g., month) and the total time (e.g., ten years) than each ID will have <=$120$ spells.

If you need, create the longitudinal dataset with empty (NA, 0, or whatever) data, for the time-dependent covariates, and make two extra utility columns for dates/times of each spell. Then you can access the database and fetch the specific values for your covariates at those dates/times and fill it in. It is OK if certain rows have no changes in covariates. You will end up with something like:

# The variable pov is the poverty percent of population and measured monthly
id time1 time2 event prevCivilWar pov
 1    0     1   0               0   0.34
 1    1     2   0               0   0.34
 ...
 1   79    80   0               0   0.43
 ...
 1  119   120   1               0   0.41
 2    0     1   0               1   0.25
 ...
 2   23    24   0               1   0.42
 ...
 2   95    96   1               1   0.58

For more info on time-dependent covariates and coefficients, see Therneau, Crowson and Atkinson's 2016 Vignette.

Do you have any code or function that does that in R? – Guilherme Parreira Mar 03 '19 at 03:04 — Guilherme Parreira, Mar 03 '19 at 03:04

How to use time dependent covariates with cox regression in R

1 Answers1