I am using a cox proportional hazards model to run a survival analysis in r on a number of non-nested, distinct covariates such as Age, Blood Type, Cancer, etc:
A, B, C, D, E
When I run the model on the omnibus null hypothesis:
surv ~ A + B + C + D
The effects of all of the covariates are insignificant because the number of subjects that have measurements for every covariate is relatively small. However, when I isolate single or other combinations of covariates in different cox models:
surv ~ A
surv ~ A + C
surv ~ B + D
I'm showing significant effects because the sample set is larger (i.e. the number of observations discarded by the model shrinks).
What I'm having difficulty understanding is how to do the following:
- Comparing the different cox models for the best fit, i.e. is
surv ~ A + B + D
a better model thansurv ~ A + C
? Should I be comparing the likelihood, wald or logrank scores? - Is it possible to run every possible combination of covariates to determine the best model? I have about 15 covariates.
- More broadly, is this tactic the best approach to optimizing for both significant covariates and overall model "cost"? I will be attaching a cost to each distinct cox model i.e. using covariates
A + B + C
in the model costs \$100 while using covariatesA + B
costs \$75 and using only covariateA
costs \$10. I'd like to look at the cost for each combination of covariates vs. the accuracy for each cox model.
Thanks very much for your help!