0

I made a coursera course abouth survival analysis. It uses a heart failure dataset for the examples. (That same dataset can be found on Kaggle https://www.kaggle.com/jackleenrasmybareh/heart-failure)

After finishing the course I have been reading about time dependent variables. The previous dataset does not use them, so I would like to simulate that one of these variables, for example arrhythmias is a time dependent variable.

I built a custom function to get that, simulating the value of the faked variable with a certain inclanation depending on the time-to-event to make that variable statistically significat. The problem I am facing is that proportional hazards assumption is not being satisfied.

How could I make the faked data to satisfy the proportional hazards assumption and keep my variable statistically significat?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Alfonso_MA
  • 111
  • 3

1 Answers1

0

I don't know that there's a reliable way to start with a data set, use its specific survival and censoring times, and tweak a particular covariate from a time-fixed to a time-varying value in a way that both maintains "significance" and proportional hazards (PH). There might be, but I'm not clever enough to figure it out.

If you would simply like to start learning about time-varying covariates, the time-dependence vignette is a good place to start, with examples of such data provided by the R survival package.

If your interest is in simulating survival data, the CRAN Survival Task View lists a dozen packages for that, some of which seem capable of working with time-varying covariates. I haven't used them myself, however. What might come closest to what you wish to accomplish would be to fit a reasonable parametric PH model (e.g., Weibull) to the time-constant-covariate data set that you have, and then use the parameter estimates as the basis for simulation.

Be warned that there can be hidden "gotchas" in setting up time-dependent covariates for survival analysis. You must take care that the covariate trajectories over time make sense, and keep in mind that, if there is at most one event per individual, specifying a covariate value at some time implies that such an individual has already survived to that time. Some, like the author of the Python lifelines package, thus refuse to envision model predictions from new sets of time-dependent covariates.

EdM
  • 57,766
  • 7
  • 66
  • 187