Converting survival analysis by a continuous variable to categorical so as to find level of most significant difference

Question

I am doing a survival analysis by a continuous variable using:

fit <- coxph(Surv(Survival, Dead) ~ HighestKi67, data = SurvivalbyHighestKi67)
fit

which returns:

               coef exp(coef) se(coef)     z        p
HighestKi67 0.12152   1.12921  0.02582 4.706 2.53e-06

Likelihood ratio test=15.9  on 1 df, p=6.666e-05
n= 160, number of events= 56

I want to represent this graphically so I thought to separate the continuous variable into 2 categories (less than X VS greater than X) and plot on a K-M plot

So my question is this: other than trial and error, how could I find the cut-off value(X) that would maximise the significant difference between the two categories?

More generally, is there another way to graphically represent survival analysis of a continuous variable?

With thanks,

EdM mentions the rms package authored by Frank Harrell. Although Frank does not sanction categorizing predictors as you suggest, he does presnt the hazard ratios for comparison of individuals at the 25th percentile and the 75th percentiles of predictor distributions.It would be somewhat similar to comparing the lowest tertile (lower third) to the highest tertile (upper third) for the predictor. — DWin, Nov 25 '20 at 01:12

score 2 · Accepted Answer · answered Nov 19 '20 at 13:04

Don't try to " maximize the significant difference between the two categories." There is much than can be lost when you try to categorize a continuous predictor. The best cutoff found from your data probably wouldn't be the best cutoff in a new set of data.

There are two approaches that can accomplish what you need without leading to unrealistic expectations about performance on new data.

For display only, not for establishing "significance," you can simply choose a reasonable cutoff that illustrates the survival difference. The median predictor value is often used. If there's bi-modality in the distribution of predictor values, you could use a cutoff at the dip in the distribution between the two modes. Explain clearly to your audience what you did, and emphasize that the "significance" of your result is based on the continuous modeling, not on this display.

A second approach is to plot the hazard ratio and its confidence limits as a continuous function of the predictor value. There are software tools that can help with that display, like the Predict() function in the rms package in R. That has the advantage of displaying precisely what you modeled.

Converting survival analysis by a continuous variable to categorical so as to find level of most significant difference

1 Answers1