1

I want to generate right censored data, but I want to be able to pass in a parameter to a function to dictate that a certain percentage of the data will be censored. I have found this R-package:

http://www.inside-r.org/node/218748 simple.surv.sim {survsim}

But it seems to generate things randomly, I can get the proportion of censored data doing a summary statistic on the output dataframe from this simple.surv.sim function, after it generates the data . But how would one get a particular proportion of the data to be censored. [Note: I will be using the weibull distribution as the distribution of choice]

Hope someone can help.

Palu
  • 217
  • 2
  • 12
  • There are obvious ad hoc ways of doing it, like just randomly selecting some proportion of the data and censoring them somehow. Can you give some more details about the data you're trying to generate? – dsaxton Jul 18 '15 at 18:03
  • Hi there, I don't have any particular data in mind per se. Any context would be OK. For example time to event would be the occurrence of cancer after 365 days lets say, and we have 200 patients. – Palu Jul 18 '15 at 18:07
  • It is trivial to censor a fixed proportion of the data: censor them at the corresponding empirical quantile. Unfortunately, no matter how you go about doing this procedure, you wind up with a censoring mechanism in which the censoring threshold is not independent of the data--which likely is not what is assumed by any models you might be using to analyze the data. Could you therefore expand on why you want to do this? What is the purpose? – whuber Jul 18 '15 at 19:13
  • Hi whuber, thanks for your input. The purpose is to be able to generate data so one can use this to practice doing survival analysis. This is one way to avoid relying on real data sets for survival analysis, which there are not many. – Palu Jul 18 '15 at 19:25
  • In that case there is no apparent need to censor exactly a given proportion of the data: that would be unrealistic. BTW, there are tens of thousands of real datasets for survival analysis available: look in the engineering risk and medical studies literature. – whuber Jul 19 '15 at 18:14
  • Hi whuber, thanks for the pointing out the other article, I just found it about 12 hours ago. I have had difficulty finding good survival data. And I am not looking for things to be realistic per se. I want to generate this type of data to analyze how changing proportion of censored data affects survival regression. – Palu Jul 19 '15 at 21:55

0 Answers0