8

I ran an interval censor survival curve with R, JMP and SAS. They both gave me identical graphs, but the tables differed a bit. This is the table JMP gave me.

Start Time  End Time    Survival    Failure SurvStdErr
.            14.0000      1.0000     0.0000     0.0000
16.0000      21.0000      0.5000     0.5000     0.2485
28.0000      36.0000      0.5000     0.5000     0.2188
40.0000      59.0000      0.2000     0.8000     0.2828
59.0000      91.0000      0.2000     0.8000     0.1340
94.0000     .             0.0000     1.0000     0.0000

This is the table SAS gave me:

Obs Lower Upper Probability Cum Probability Survival Prob Std.Error
1    14    16      0.5          0.5             0.5        0.1581
2    21    28      0.0          0.5             0.5        0.1581
3    36    40      0.3          0.8             0.2        0.1265
4    91    94      0.2          1.0             0.0        0.0

R had a smaller output. The graph was identical, and the output was:

Interval (14,16] -> probability 0.5
Interval (36,40] -> probability 0.3
Interval (91,94] -> probability 0.2

My problems are:

  1. I don't understand the differences
  2. I don't know how to interpret the results...
  3. I don't understand the logic behind the method.

If you could assist me, especially with the interpretation, it would be a great help. I need to summarize the results in a couple of lines and not sure how to read the tables.

I should add that the sample had 10 observations only, unfortunately, of intervals in which events happened. I didn't want to use the midpoint imputation method which is biased. But I have two intervals of (2,16], and the first person not to survive is failed at 14 in the analysis, so I don't know how it does what it does.

Graph:

enter image description here

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user45442
  • 81
  • 2
  • 1
    Actually, `R` and `SAS` completely agree with each other: `SAS` includes 4 intervals instead of 3, *but* note that the CDF does not change in interval 2! In fact, the `JMP` results agree as well, but are a little harder to follow. – Cliff AB Apr 07 '16 at 01:12

1 Answers1

1

The most important issue here is the understanding of censoring and which type applies in your situation. So for your problems 1. and 3., understand the context of your problem. This will help you define the appropriate censoring method.

The R output says that the first group of failures is in the interval (14,16]. This doesn't mean the failure occurred at 14. It means that R assumed the data to be right-censored, which is the most common assumption for survival analysis. Why is the failure quoted as a range (14,16] as opposed to just a probability at 16? It's likely due to a confidence limit estimation.

Interpreting the R result, which is similar to SAS: The probability of failure at t=16 is 50%, at t=40 is 30%, at t=94 is 20%.

Forget about trying to understand the issue by using three analysis packages. Pick one, understand the options you can set for censoring, and use it. A good link for R: here

Gary Chung
  • 84
  • 4
  • The context of the question is a relapse of a condition. I am interested in the time of the relapse. Unfortunately, the follow-up visits are not on a daily basis, and so if on visit number 4 the relapse happened, I do not know where between 3 (+ a day) and 4 it happened. The censoring is right censoring, and among 10 observations only 1 was censored to be (94,infinity). Will it be correct to say that 50% survived more than 28 days ? – user45442 May 13 '14 at 05:40
  • and one more question, since interval censoring is based on unknown data, how efficient is the estimation based on 10 observations? Are the estimates really better than the ones I would get in the biased way of midpoint imputation, in which I take the mean of every interval to represent the interval ? – user45442 May 13 '14 at 05:47
  • 1
    I wouldn't say 50% survived 28 days or more, since you don't know that for the very reason of uncertainty during the interval you pointed out. You can say that 50% survived to Day 16. Regarding the issue of interval, you bring up a very real issue that has to do with data imprecision. Using a midpoint imputation method makes sense, but the widely accepted approach for your situation is the [Kaplan Meier estimation](http://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator). – Gary Chung May 14 '14 at 00:08
  • 3
    @GaryChung: you are completely ignoring the **interval** censoring aspect of this data. – Cliff AB Apr 07 '16 at 01:09