I have some very empirical data derived from texture analysis of radiology images of lung cancer. As a result of post-processing, there are two nearly identical datasets of 53 cases each, differing only in how they have the tumours have been delineated / segmented. One delineation method is called 'freehand' and the other 'flab'.
I have modeled survival of the 53 cases on the basis of texture variable ' entropy'. Two univariate cox proportional hazards models are obtained, one for manual delineation and one for flab. But i am finding it hard to decide which model is better. In this post How to interpret and compare models in Cox regression?, someone has suggested using AIC, so I am including its output. I would like to know if both models are equivalent (is there a p-test to compare AIC of two models?) and if so, I would like to hear what statistical measures should be reported to support this claim.
Thanks.
Using R, below is the output of each model:
FLAB model
Call:
coxph(formula = y ~ entropyflab)
n= 53, number of events= 34
coef exp(coef) se(coef) z Pr(>|z|)
entropyflab -0.15838 0.85353 0.06505 -2.435 0.0149 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
entropyflab 0.8535 1.172 0.7514 0.9696
Concordance= 0.609 (se = 0.056 )
Rsquare= 0.089 (max possible= 0.984 )
Likelihood ratio test= 4.93 on 1 df, p=0.02635
Wald test = 5.93 on 1 df, p=0.01491
Score (logrank) test = 6.1 on 1 df, p=0.01355
FLAB AIC:
215.9091
Free hand cox model:
coxph(formula = y ~ entropyfh)
n= 53, number of events= 34
coef exp(coef) se(coef) z Pr(>|z|)
entropyfh -0.1792 0.8359 0.0761 -2.355 0.0185 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
entropyfh 0.8359 1.196 0.7201 0.9704
Concordance= 0.655 (se = 0.056 )
Rsquare= 0.085 (max possible= 0.984 )
Likelihood ratio test= 4.72 on 1 df, p=0.02975
Wald test = 5.55 on 1 df, p=0.01852
Score (logrank) test = 5.76 on 1 df, p=0.01644
Freehand AIC:
216.1183