How to interpret the output of predict.coxph?

Question

After fitting a coxmodel it is possible to make predictions and retrieve the relative risk of new data. What I don't understand is how the relative risk is computed for an individual and what is it relative to (i.e. the average of the population)? Any recommendations for resources to help understand (I not very advanced in survival analysis so the simpler the better)?

score 42 · Accepted Answer · edited May 19 '21 at 14:12

Edit: the following description applies to survival versions 3.2-8 and below. Starting with version 3.2-9, the default behavior of predict.coxph() changes with respect to treating 0/1 (dummy indicator) variables. See NEWS.

predict.coxph() computes the hazard ratio relative to the sample average for all $p$ predictor variables. Factors are converted to dummy predictors as usual whose average can be calculated. Recall that the Cox PH model is a linear model for the log-hazard $\ln h(t)$:

$$ \ln h(t) = \ln h_{0}(t) + \beta_{1} X_{1} + \dots + \beta_{p} X_{p} = \ln h_{0}(t) + \bf{X} \bf{\beta} $$

Where $h_{0}(t)$ is the unspecified baseline hazard. Equivalently, the hazard $h(t)$ is modeled as $h(t) = h_{0}(t) \cdot e^{\beta_{1} X_{1} + \dots + \beta_{p} X_{p}} = h_{0}(t) \cdot e^{\bf{X} \bf{\beta}}$. The hazard ratio between two persons $i$ and $i'$ with predictor values $\bf{X}_{i}$ and $\bf{X}_{i'}$ is thus independent of the baseline hazard and independent of time $t$:

$$ \frac{h_{i}(t)}{h_{i'}(t)} = \frac{h_{0}(t) \cdot e^{\bf{X}_{i} \bf{\beta}}}{h_{0}(t) \cdot e^{\bf{X}_{i'} \bf{\beta}}} = \frac{e^{\bf{X}_{i} \bf{\beta}}}{e^{\bf{X}_{i'} \bf{\beta}}} $$

For the estimated hazard ratio between persons $i$ and $i'$, we just plug in the coefficient estimates $b_{1}, \ldots, b_{p}$ for the $\beta_{1}, \ldots, \beta_{p}$, giving $e^{\bf{X}_{i} \bf{b}}$ and $e^{\bf{X}_{i'} \bf{b}}$.

As an example in R, I use the data from John Fox' appendix on the Cox-PH model which provides a very nice introductory text. First, we fetch the data and build a simple Cox-PH model for the time-to-arrest of released prisoners (fin: factor - received financial aid with dummy coding "no" -> 0, "yes" -> 1, age: age at the time of release, prio: number of prior convictions):

> URL   <- "https://socialsciences.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
> Rossi <- read.table(URL, header=TRUE)                  # our data
> Rossi[1:3, c("week", "arrest", "fin", "age", "prio")]  # looks like this
  week arrest fin age prio
1   20      1  no  27    3
2   17      1  no  18    8
3   25      1  no  19   13

> library(survival)                                      # for coxph()    
> fitCPH <- coxph(Surv(week, arrest) ~ fin + age + prio, data=Rossi)    # Cox-PH model
> (coefCPH <- coef(fitCPH))                              # estimated coefficients
     finyes         age        prio 
-0.34695446 -0.06710533  0.09689320

Now we plug in the sample averages for our predictors into the $e^{\bf{X} \bf{b}}$ formula:

meanFin  <- mean(as.numeric(Rossi$fin) - 1)   # average of financial aid dummy
meanAge  <- mean(Rossi$age)                   # average age
meanPrio <- mean(Rossi$prio)                  # average number of prior convictions
rMean <- exp(coefCPH["finyes"]*meanFin        # e^Xb
           + coefCPH["age"]   *meanAge
           + coefCPH["prio"]  *meanPrio)

Now we plug in the predictor values of the first 4 persons into the $e^{\bf{X} \bf{b}}$ formula.

r1234 <- exp(coefCPH["finyes"]*(as.numeric(Rossi[1:4, "fin"])-1)
           + coefCPH["age"]   *Rossi[1:4, "age"]
           + coefCPH["prio"]  *Rossi[1:4, "prio"])

Now calculate the relative risk for the first 4 persons against the sample average and compare to the output from predict.coxph().

> r1234 / rMean
[1] 1.0139038 3.0108488 4.5703176 0.7722002

> relRisk <- predict(fitCPH, Rossi, type="risk")   # relative risk
> relRisk[1:4]
        1         2         3         4 
1.0139038 3.0108488 4.5703176 0.7722002

If you have a stratified model, the comparison in predict.coxph() is against the strata-averages, this can be controlled via the reference option that is explained in the help page.

+1 because it is not obvious to get what predict.coxph exactly does from the help page! — ocram, Dec 02 '12 at 12:29
@Zhubarb `fin` is binary, so the numerical representation of the factor just has the values 1 and 2. Subtracting 1 gives us the dummy-coded variable with values 0 and 1 that also appears in the design matrix. Note that this won't work for factors with more than 2 levels. It is certainly debatable whether averaging dummy variables makes sense, but that's what `predict.coxph()` does. — caracal, Jun 02 '14 at 07:03
In words, how would you interpret a hazard ratio of 3.01 (e.g. relRisk[2])? — RNB, Sep 23 '16 at 10:34
@caracal Do you have a similar example for calculating the standard errors of the fitted value? When using a linear model, I can use the formula s.e=unname(rowSums((Xp V) * Xp)), where v=covariance matrix, Xp = covariates to predict for, to get the same standard errors given by predict function. When using the same formula for s.e of a fitted value from a cox model, they do not match up. I assume this is because predict function gives the standard error of the ratio which you highlight above. — AP30, Feb 14 '18 at 17:56
@AP30 I'm sorry not to be of help here. It might be worth studying the source of `survival::predict.coxph()` at [https://github.com/cran/survival/blob/master/R/predict.coxph.R](https://github.com/cran/survival/blob/master/R/predict.coxph.R) — caracal, Feb 15 '18 at 09:39
Very clear answer. Thank you. I have a question about using the mean of a skewed continuous covariate for the reference average hazard calculation (by strata or overall sample). Wouldn't an extreme positive skew on, for example, age make any reference mean age a poor measure of central tendency? A median would be a better statistic unless skewed covariates can be transformed to normal? — Seanosapien, Mar 03 '20 at 12:05

How to interpret the output of predict.coxph?

1 Answers1

Linked