I am constructing Cox models that predict survival in a clinical trials cohort.
After speaking to our statistician (who is away at the moment, hence this post), I was advised to take a forward likelihood ratio-test approach to building Cox survival models, starting with a base model and adding the term that improved the model, by computing a p value from subtracting the likelihood ratio statistic from the extended model from the likelihood ratio from the base model, as outlined in the R code below.
I realise that Stata is probably a better fit for this sort of analysis, but I i) I don't have easy access to Stata and ii) am familiar with R (also have access to SPSS), so with that caveat, here is the general format of the code I am using:
Conceptually, this makes sense to me if I add binary covariates to the model, but I was wondering whether this approach is appropriate for adding a continuous variable, as outlined below? I'm not sure that a degree of freedom equal to one is correct for this comparison?
y<-0:1
data<-data.frame(cbind(sample(y,100,replace=TRUE),runif(100,min=0,max=10),sample(y,100,replace=TRUE),runif(100,min=0,max=1)))
colnames(data)<-c("EFS_Status","EFS_Time","var1","Contvar2")
library(survival)
base <- coxph(Surv(EFS_Time,EFS_Status) ~ var1, data=data) # Create base model
lr1 <- -2*base$loglik[2] # Likelihood ratio of base model
extend <- coxph(Surv(EFS_Time,EFS_Status) ~ var1 + Contvar2, data=data) # Extended model
lr2<- -2*extend$loglik[2] # Likelihood ratio of extended model
pchisq(q=lr1-lr2,df=1,lower.tail=FALSE) # 1 df correct for continuous variables?
Any guidance is most appreciated. Obviously, I could binarize the continuous variable, but I suppose you're losing information by taking that approach.
Thanks for reading, Ed