1

I have a survival model calculated like this:

    mv.cox <- coxph(Surv(as.numeric(Time), as.numeric(Status))  ~ 
                     Sex, data = as.data.frame((df))) 

I am looking for a way to calculate a bootstrap for this model/data that should include the Hazard ration, CI and p-value.

I tried using the the following, but it does not give the Hazard or p-value:

    library(coxed)

    coxed(model, method="npsf", bootstrap = TRUE, B=1000)

I also ran the bootcov, but not sure what the output really means.

     require(rms)

    mv.cph <- cph(Surv(as.numeric(Time), as.numeric(Status))  ~ 
                   df$Sex,x=T,y = T, data = as.data.frame((df))) 
    bootcov(mv.cph)
    
    bootcov(mv.cph)
    Cox Proportional Hazards Model
    
     cph(formula = Surv(as.numeric(Time), as.numeric(Status)) ~ 
           df$Sex, 
         data = as.data.frame((df)), x = T, y = T)
    
                            Model Tests    Discrimination    
                                                  Indexes    
     Obs        86    LR chi2     12.44    R2       0.137    
     Events     45    d.f.            1    Dxy      0.316    
     Center 0.2771    Pr(> chi2) 0.0004    g        0.581    
                      Score chi2  14.41    gr       1.788    
                      Pr(> chi2) 0.0001                      
    
        Coef   S.E.   Wald Z Pr(>|Z|)
     df 0.2122 0.0606 3.50   0.0005 

And the validate function:

      mv.cph.val <- validate(mv.cph, method = "boot", B = 100, )

          index.orig training   test optimism index.corrected   n
    Dxy       0.3162   0.3214 0.3162   0.0052          0.3110 100
    R2        0.1369   0.1446 0.1369   0.0077          0.1293 100
    Slope     1.0000   1.0000 1.0606  -0.0606          1.0606 100
    D         0.0324   0.0354 0.0324   0.0030          0.0293 100
    U        -0.0057  -0.0057 0.0029  -0.0086          0.0029 100
    Q         0.0380   0.0411 0.0295   0.0116          0.0264 100
    g         0.5810   0.5899 0.5810   0.0088          0.5722 100  
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user2300940
  • 161
  • 7

1 Answers1

3

The coxed package is designed "to calculate duration-based quantities from Cox model results" rather than to show hazard ratios, etc. See the vignette. So what you found is probably a feature, not a bug.

You should be able to do what you want directly with the boot() and boot.ci() functions in the standard R boot package, following standard procedures. You write a function to capture the statistic of interest from the model applied to each bootstrap sample, provide that to the boot() function, then send the output to boot.ci() to get various types of confidence intervals.

The trick with a survival model is to work with the regression coefficients returned by the Cox models directly, with their asymptotic normal distributions, rather than the hazard ratios. Then at the very end you may translate the regression coefficients and standard errors into hazard ratios if you wish.

The rms package can do this and a lot more for you in terms of model validation and calibration via bootstrap, for many types of regression models including Cox.

In response to edited question:

First, make sure that you understand an outline of maximum (partial) likelihood model fitting and the associated likelihood-ratio, score, and Wald tests. Second, note that these rms functions preferentially report the linear predictor of log-hazards and their standard errors/CIs rather than the corresponding hazard ratios. Values in hazard-ratio scales can be obtained by exponentiating the values in the linear-predictor scale. In your case there is only 1 predictor, so the linear predictor for each case is simply the product of the regression coefficient times the Sex indicator value.

The bootcov() report is essentially what you get for cph() Cox survival models. "Center" is the average linear predictor over all cases. Under "Model Tests" you have results of the likelihood-ratio (LR) and score tests on the entire model. The LR test is generally considered most reliable, particularly for small sample sizes.

The "Coef" value is the regression coefficient for Sex with the corresponding "SE." Their ratio "Z" is asymptotically distributed as a standard normal variate, providing the basis for the Wald-test p-value. That set of values could be different between a bootcov and a standard cph object, as the coefficient (co)variance matrix, from which the "SE" values are calculated, comes from the bootstrapped models with bootcov() rather than the information matrix obtained from fitting a single model with cph().

With a more complicated model you would have one line for each estimated coefficient. You can use the rms version of anova() to get Wald tests from complicated models that combine all levels of a categorical predictor or that incorporate all non-linear or interaction terms involving a predictor.

The other items reported are measures of model quality that go beyond simple coefficient tests, based on the likelihoods and the linear-predictor values from the modeling. These are explained in Chapters 9 and 10 of Frank Harrell's textbook. Briefly:

"R2" is a likelihood-based analog of the $R^2$ from an ordinary least-squares (OLS) regression, limited to values between 0 and 1 and calculated in a way that reproduces the $R^2$ of an OLS model.

"Dxy" is a rank-correlation measure of observed and predicted survival times. If $C$ is the concordance between between comparable pairs of observed and predicted survival times (equivalent to the area under a receiver operating characteristic curve, AUC-ROC), then $D_{xy}= 2(C-0.5)$.

"g" is a Gini index based on the mean absolute difference of linear-predictor values among pairs of cases. It thus indicates a "typical" difference among cases in the model's linear-predictor values. "gr" is just its exponential, putting it into the hazard-ratio scale.

The optimism bootstrapping done by validate() provides refined values of those measures and adds a few more. Models are built on multiple bootstrapped samples, each of which is a "training" set. Each of those models is also applied to the entire data set as a single "test" set. The difference averaged over all the bootstraps is the "optimism" in the modeling due to overfitting of the "training" sets, seen in their application to the "test" set.

As a bootstrapped sample is to the original data sample as the original data sample is to the full population, this suggests that values from the original model ("index.orig") should be corrected by that "optimism" to represent the quality of the original model when it's applied to the full population ("index.corrected").

The "Slope" optimism is the factor by which you need to adjust the slope of the relationship between the linear predictor and outcome from "training" samples to best fit the "test" samples. In more typical cases it is less than 1; for example, a value of 0.9 might be interpreted as about 10% overfitting.

"D," "U," and "Q" from validate() are likelihood-based measures of model quality. "D" is the likelihood-ratio $\chi ^2$ divided by the number of observations, an initial measure of the model's discrimination capacity. "U" estimates the "unreliability" of the model, taking into account the "Slope" optimism. Q is their difference, an optimism-corrected measure of model discrimination.

EdM
  • 57,766
  • 7
  • 66
  • 187
  • Thanks. Tried the bootcov function, but not sure what the output really means. Please see updated question. – user2300940 Dec 23 '21 at 07:33
  • @user2300940 I tried to add some explanation that would make sense both to you and other visitors to the site. – EdM Dec 23 '21 at 19:17