Continuous Variable No Longer Follows PH Assumption When Categorized

Question

I am creating a cox regression model with multiple covariates.

I have two models:

model A contains my variable of interest in its original continuous form, and
model B contains my variable of interest as a categorical variable with ordinal scale.

After constructing my model, I checked the Schoenfeld residuals of my variables and found that my variable violates the PH assumption in model B, but not model A.

What should I do in this case? Is this result of me improperly discretizing my variable?

Thank you!

don't categorize continuous variables. Any reason that you have for that is wrong, there are plenty of papers about it. — rep_ho, Nov 12 '19 at 16:17
Mind pointing me to the papers? Also, normally I would not, and I am perfectly happy with model A, but I also include Kaplan-Meier curves and analyses that incorporate the categorical variable as a part of my study. Categorizing would allow me show the result consistent with the KM curve. — John Smith, Nov 12 '19 at 17:06
Here are two https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1458573/ https://www.ncbi.nlm.nih.gov/pubmed/16217841 I remember much more. I don't know much about KM curves tho — rep_ho, Nov 13 '19 at 10:33

score 1 · Answer 1 · answered Nov 12 '19 at 20:33

There is a great introduction to the pitfalls of categorizing continuous variables on this page, with links to more information. It seems that you have a continuous predictor that was linearly related to log-hazard with no (or some simple) transformation, so you should take advantage of that simplicity as much as possible. Even in practical terms, once you got to a 3-level ordinal categorization you were using up more degrees of freedom than you did with the continuous representation.

In terms of displaying results, you could do the statistical analysis with the continuous analysis and illustrate with Kaplan-Meier plots based on binning of the predictor values. With multiple covariates affecting outcome even that type of display is potentially misleading, as the plots hide the associations of the other covariates (and their own relationships with outcome) with your predictor of interest. There's no assurance that empirical Kaplan-Meier curves will properly represent the relationship between your predictor of interest and outcome (although a reviewer might well want to see them).

Perhaps a better choice would be to plot predicted survival curves, with confidence intervals, based on the Cox model. If there is a set of covariate values that is consistent with a reasonable range of your predictor of interest, you could set the other covariates to that set of values and show predicted survival curves for 3 or so values of your predictor of interest that span that range.

Continuous Variable No Longer Follows PH Assumption When Categorized

1 Answers1