10

I've been researching the mice package, and I haven't yet discovered a way to use the multiple imputations to make a Cox model, then validate that model with the rms package's validate() function. Here is some sample code of what I have so far, using the data set veteran:

library(rms)
library(survival)
library(mice)

remove(veteran)
data(veteran)
veteran$trt=factor(veteran$trt,levels=c(1,2))
veteran$prior=factor(veteran$prior,levels=c(0,10))

#Set random data to NA 
veteran[sample(137,4),1]=NA
veteran[sample(137,4),2]=NA
veteran[sample(137,4),7]=NA

impvet=mice(veteran)
survmod=with(veteran,Surv(time,status))

#make a CPH for each imputation
for(i in seq(5)){
    assign(paste("mod_",i,sep=""),cph(survmod~trt+celltype+karno+age+prior,
        data=complete(impvet,i),x=T,y=T))
}

#Now there is a CPH model for mod_1, mod_2, mod_3, mod_4, and mod_5.

Now, if I were just working with one CPH model, I would do this:

validate(mod_1,B=20)

The problem I'm having is how to take the 5 CPH models (1 for each imputation), and be able to create a pooled model that I can then use with rms. I know that the mice package has some built-in pooling functions but I don't believe they work with the cph object in rms. The key here is being able to still use rms after pooling. I looked into using Harrell's aregImpute() function but I'm having some trouble following the examples and documentation; mice seems simpler to use.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
JJM
  • 767
  • 1
  • 8
  • 19
  • By the way: moderators, if you think this Q belongs on Stack Overflow then please feel free to migrate it. – JJM Dec 20 '12 at 21:40
  • Hi @JJM. I am in a similar situation where I need to pool my cox models from the different imputed datasets and then validate. In order to generate the one pooled model, how should the baseline cumulative hazards be combined? The logs of the hazard ratios (coefficients) can be pooled easily as they have asymptotic normality. However, to calculate survival probabilities you also need an estimate of the baseline (cumulative) hazard. This does not have asymptotic normality, as far as I am aware, so I am unsure how to pool multiple coxph models into a single model. Many thanks if you see this. – AP30 Feb 26 '18 at 15:19

2 Answers2

13

The fit.mult.impute function in the Hmisc package will draw imputations created from mice just as it will from aregImpute. cph will work with fit.mult.impute. The harder question is how to do validation through resampling when also doing multiple imputation. I don't think anyone has really solved that. I usually take the easy way out and use single imputation to validate the model, using the Hmisc transcan function, but using multiple imputation to fit the final model and to get standard errors.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • 1
    Thanks for your helpful reply, Dr. Harrell. I'd just like to sum up my understanding of what you said. Please correct me if I'm misreading it: `fit.mult.impute()`: Use this to pool the `cph()` models (5 of them, based on 5 imputations from `mice`) and obtain pooled hazard ratios and standard errors. `transcan()`: Use this to create a single imputation and validate that. It sounds like this gives a good enough validation. Is all of that correct? I really appreciate your help, Dr. Harrell. – JJM Dec 21 '12 at 12:43
  • 1
    That's correct. The single imputation validation is a temporary stand-in for the multiple imputation fit. – Frank Harrell Dec 21 '12 at 14:27
1

I looked into some examples in the Himsc document for fit.mult.impute() function but could not find an example for coxph. Just in case someone is looking for the same thing, here is an example of how I used fit.mult.impute() for cox regression pooling:

x1 <- factor(sample(c('a','b','c'),100,TRUE)
x2 <- (x1=='b') + 3*(x1=='c') + rnorm(100)
y <- x2 + 1*(x1=='c') + rnorm(100)
x1[1:20] <- NA
x2[18:23] <- NA
ttocvd = sample(0:20, 100, replace = TRUE)
CVD = sample(0:1, 100, replace = TRUE)
d <- data.frame(x1,x2,y, ttocvd, CVD)
f <- transcan(~y + x1 + x2+CVD+ ttocvd, n.impute=10, shrink=TRUE, data=d)
#f <- mice(d) #if using mice imputation 
h <- fit.mult.impute(Surv(ttocvd, CVD) ~ x1 + x2, coxph, f, data=d)
summary(h)
D.deng
  • 21
  • 2