Comparing Cox proportional hazards models - AIC?

Question

I have some very empirical data derived from texture analysis of radiology images of lung cancer. As a result of post-processing, there are two nearly identical datasets of 53 cases each, differing only in how they have the tumours have been delineated / segmented. One delineation method is called 'freehand' and the other 'flab'.

I have modeled survival of the 53 cases on the basis of texture variable ' entropy'. Two univariate cox proportional hazards models are obtained, one for manual delineation and one for flab. But i am finding it hard to decide which model is better. In this post How to interpret and compare models in Cox regression?, someone has suggested using AIC, so I am including its output. I would like to know if both models are equivalent (is there a p-test to compare AIC of two models?) and if so, I would like to hear what statistical measures should be reported to support this claim.

Thanks.

Using R, below is the output of each model:

FLAB model

Call:
coxph(formula = y ~ entropyflab)

  n= 53, number of events= 34 

                coef exp(coef) se(coef)      z Pr(>|z|)  
entropyflab -0.15838   0.85353  0.06505 -2.435   0.0149 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            exp(coef) exp(-coef) lower .95 upper .95
entropyflab    0.8535      1.172    0.7514    0.9696

Concordance= 0.609  (se = 0.056 )
Rsquare= 0.089   (max possible= 0.984 )
Likelihood ratio test= 4.93  on 1 df,   p=0.02635
Wald test            = 5.93  on 1 df,   p=0.01491
Score (logrank) test = 6.1  on 1 df,   p=0.01355

FLAB AIC: 215.9091

Free hand cox model:

coxph(formula = y ~ entropyfh)

  n= 53, number of events= 34 

             coef exp(coef) se(coef)      z Pr(>|z|)  
entropyfh -0.1792    0.8359   0.0761 -2.355   0.0185 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

          exp(coef) exp(-coef) lower .95 upper .95
entropyfh    0.8359      1.196    0.7201    0.9704

Concordance= 0.655  (se = 0.056 )
Rsquare= 0.085   (max possible= 0.984 )
Likelihood ratio test= 4.72  on 1 df,   p=0.02975
Wald test            = 5.55  on 1 df,   p=0.01852
Score (logrank) test = 5.76  on 1 df,   p=0.01644

Freehand AIC: 216.1183

Ive voted to move this to crossvalidated as it is about interpretation of results. The model with the lowest aic is interpreted as the best model, however, your models are very similar, aic and coefficients size, so they are not really distinguishable, which perhaps is expected, as from your description they are measuring the same thing. — user20650, Jan 06 '17 at 12:53
Didnt realise i was posting in stackoverflow. Any way i might be able to move it sooner? — Maelstorm, Jan 06 '17 at 13:06
I'm voting to close this question as off-topic because it is not about programming. — , Jan 06 '17 at 13:17

IWS · Accepted Answer · 2017-01-06T15:51:43.340

For the Cox proportional hazard model the baselinehazard (i.e. 'intercept') is not estimated and so the likelihood is only a partial one. Even though this is partial, it is possible to compare nested COX models using a likelihood ratio test (LRT) to test for a significant difference in model fit.

The Akaike's Information Criterion (AIC) is depended on likelihood as well, but also on the amount of predictors/parameters/degrees of freedom used by the model. It is defined as follows $ AIC = 2k - 2ln(L) $ where k = the amount of df used and L the (partial) likelihood of your model. Lower AICs correspond to a better fit.

The AIC often aligns with the LRT, but sometimes result in different conclusions due to the strict correction for every parameter/df added to your model.

In your situation however, the models are not nested. To my knowledge no (simple) statistical tests to compare these models on data fit are available (if so, I'd like to learn as well!). The AIC is somewhat of an exception to this, because its correction for the amount of parameters makes unnested models made for the same outcome on the same data, more comparable.

So to conclude, no, there is no easy way of comparing the specific AICs using a statistical test. So IMHO the model with the lowest AIC would be best, irregardless of significance tests.

As a suggestion for further thought and smarter people: might bootstrapping the data, building the two models in each bootstrap dataset, and then comparing the distributions of AIC be a way to statistically test these two AICs?

score 0 · Answer 2 · answered Jan 07 '17 at 15:23

I think that your problem is one of model selection.

If you would consider forward selection, and stop after including one variable, then you would end up with the FLAB model. As compared to the other univariate model, it is more significant and has smaller AIC. From the output, there would be no reason not to select this model.

The two AICs tell you that the models are roughly the same in terms of goodness of fit. But you can't rigorously compare the models because they are not nested (like IWS said).

I think that you should also fit a model with both FLAB and Freehand. Then you would be able to look at the univariate models nested in the multivariate one, and use likelihood ratio tests or something similar to asses whether a model with FLAB + Freehand is better than a model with FLAB only.

score 0 · Answer 3 · answered Jan 20 '20 at 12:22

0

With everything else being quite similar, the c-index (Concordance in the model summary) values differ and could therefore be used to compare models. Based on this, the freehand model is better (0.655 vs. 0.609).

answered Jan 20 '20 at 12:22

Seanosapien

121
1
6

Comparing Cox proportional hazards models - AIC?

3 Answers3

Linked