Assessing loss of different parsimony levels (Cox Model)

Question

I have a Cox Proportional Hazard model with 6 covariates to determine OS. I am now trying to simplify this model by taking some of this covariates down. This is intended for a wide audience so I'm assuming not everyone will be using all 6 variables. I would want to inform the user of the cost of not entering all the variables by giving them some sort of "% of loss" or "accuracy" thing. I've did the following.

Fit the whole model and fit several 1-missing variable models. Then compute RMSE of several quantities. Thing is I don't know how much informative or "correct" this is.

I computed RMSE for:

A) Predictions of both models (so I obtain the mean error of the partial model based on the whole model)

predsfull <- predict(cph(Surv(months, cens)~rcs(var1, 4)+rcs(var2,4)+rcs(var3,4)+var4+var5+rcs(var5,4), data=df, x=T, y=T, surv=T))

predspart <- predict(cph(Surv(months, cens)~rcs(var1, 4)+rcs(var2 ,4)+rcs(var3,4)+var4+var5, data=df, x=T, y=T, surv=T)) ##Var6 is now gone

sqrt(mean((predspart-predsfull)^2))

B) The same as A, but instead of predict I did the RMSE with the survest(fullmodel)$surv and the partial model.

C) The RMSE of the bootstrap corrected C-stat given by the validate function of the rms package.

Is my approximation correct? I am kind of confused by the quantities I am obtaining, as I'm using RMSE I am getting the same units the response is giving, so, in all scenarios I am getting proabailities. Therefore I am reporting something like "you are sacrificing 0.023% of discriminative power using a partial model vs a full model".

Is any of that making sense?

PS: Code is just an example, I'm doing this for all 6 covariates and even 2 covariates at the same time. So the fact that some of them are non-linear has nothing to do with the fact I'm taking a covariate down, I just added the closest form of my model I could.

EdM · Accepted Answer · 2019-11-14T18:54:18.253

One problem with comparing full and partial models in general is the implicit assumption that the full model is better. If your full model is overfit, that won't be the case. (It seems that you are careful about such things, but answers here are ideally written for a general audience.) Another problem is just what the predict() and survest() functions are returning for making the comparisons. From the way you wrote the predict() code, for example, it looks like you would be looking at mean-square differences in linear predictor values; those aren't necessarily the easiest things for a wide audience to interpret.

What you presumably care about the most is how well each of the models (whether full or reduced) works on the data. For that, your choice of bootstrap-validated C-index makes the most sense among the comparisons you suggest. It's not clear what you gain from trying to calculate the RMSE of the C-statistic, as the C-statistic itself seems to be of most interest and is easy to interpret: it's the fraction of pairs of cases whose actual order in time of events agrees with the order predicted by the model.

That said, Harrell points out that the C-index is not a very sensitive way to compare models. Comparisons of models based on log-likelihood considerations are more sensitive. The validate() function in the rms package, with its bootstrapping, provides several such optimism-corrected indices of model quality: Nagelkerke's $R^2$, a discrimination index $D$, an unreliability index $U$, and an overall quality index $Q = D - U$, as described in Chapter 20 of Harrell's Regression Modeling Strategies, 2nd edition. You could consider choosing one of them to display differences among your full and reduced models. These indices, however, can depend on the censoring distribution of the data. Thus some question the reliability of these indices. A search on this site for Harrell and C-index provides links to extensive discussion; the discussions of comparisons among logistic regression models are generally applicable to Cox models, too.

Comparing how well calibrated the models are, with the calibrate() function in the rms package, can also be useful. That function uses flexible adaptive hazard regression, not assuming linearity or proportional hazards, to test model predictions against the data at a fixed point in time. In addition to an overall bootstrap-adjusted calibration curve, the function reports the mean absolute error and the 0.9 quantile of absolute error* in predicted survival probabilities. Those estimates of precision might be the easiest for a wide audience to understand.

*For the 0.9 quantile use the value displayed on a plot of the object returned by calibrate(); last I looked there seemed to be an error in the corresponding value returned by the print() function for those objects.

Thanks for your imput! My question comes motivated by Harrell's RMS book. As he suggests (chapter 5.5.2) "Our approach is to consider the full model fit as the “gold standard” model, especially the model from which formal inferences are made. We then proceed to approximate this full model to any desired degree of accuracy. For any approximate model we calculate the accuracy with which it approximates the best model". That is where I'm going for. Now, I'm aware of what Harrell thinks about comparing two models by its C-stat, and he suggests to use RMSE or MSE to assess that accuracy (5.7) — N00b, Nov 15 '19 at 14:51
Also, I just checked calibration, and everything seems fine. The only thing that changes a little bit is C-stat (that we "can't" use for comparisons), so I'm stuck with this. I have a "best" model (which is the full one, not only because It's the one we use to compute predictions but also as LR tests suggest), but I can't assess by how much it's better. All models seem to be valid in terms of overfitting (which is fine as I want to adapt the model to various needs). So if I'm interpreting Harrell right, I need a way of informing from the accuracy each partial model approximates the full one — N00b, Nov 15 '19 at 14:58
@N00b the discussion of approximating the full model in section 5.5.2 seems to be in the context of least-squares, for which MSE is appropriate. In that case, $R^2$ and MSE give equivalent assessments of model fit. For a Cox model, the analogous approach would be to use the (validated) Nagelkerke $R^2$ values. Those have potential problems with censored data, as noted, but you could present the results and discuss the potential limitations. — EdM, Nov 15 '19 at 15:12
@N00b you aren't prohibited from using C-index, it's just not very sensitive. I suspect that the accuracy in event-time ordering provided by the C-index might be just what your "wide audience" would be interested in as they consider how many variables to include in their own related studies. You could consider presenting both the $R^2$ and C-index values. The mean and 0.9 quantiles of absolute error in predicted survival probabilities at a critical time of interest for the different models might also be helpful to that audience. — EdM, Nov 15 '19 at 15:17
Thanks @EdM! I think I will take your advice and go with the C-stat/Absolute error. I don't think `R^2` is that informative because of the survival nature of the data (so I'm getting as expected fairly low values which aren't very informative and can be even confused). I think I took the C-stat "prohibition" too far. — N00b, Nov 18 '19 at 07:49

Assessing loss of different parsimony levels (Cox Model)

1 Answers1