After fitting a coxmodel it is possible to make predictions and retrieve the relative risk of new data. What I don't understand is how the relative risk is computed for an individual and what is it relative to (i.e. the average of the population)? Any recommendations for resources to help understand (I not very advanced in survival analysis so the simpler the better)?
1 Answers
Edit: the following description applies to survival
versions 3.2-8 and below. Starting with version 3.2-9, the default behavior of predict.coxph()
changes with respect to treating 0/1 (dummy indicator) variables. See NEWS.
predict.coxph()
computes the hazard ratio relative to the sample average for all $p$ predictor variables. Factors are converted to dummy predictors as usual whose average can be calculated. Recall that the Cox PH model is a linear model for the log-hazard $\ln h(t)$:
$$ \ln h(t) = \ln h_{0}(t) + \beta_{1} X_{1} + \dots + \beta_{p} X_{p} = \ln h_{0}(t) + \bf{X} \bf{\beta} $$
Where $h_{0}(t)$ is the unspecified baseline hazard. Equivalently, the hazard $h(t)$ is modeled as $h(t) = h_{0}(t) \cdot e^{\beta_{1} X_{1} + \dots + \beta_{p} X_{p}} = h_{0}(t) \cdot e^{\bf{X} \bf{\beta}}$. The hazard ratio between two persons $i$ and $i'$ with predictor values $\bf{X}_{i}$ and $\bf{X}_{i'}$ is thus independent of the baseline hazard and independent of time $t$:
$$ \frac{h_{i}(t)}{h_{i'}(t)} = \frac{h_{0}(t) \cdot e^{\bf{X}_{i} \bf{\beta}}}{h_{0}(t) \cdot e^{\bf{X}_{i'} \bf{\beta}}} = \frac{e^{\bf{X}_{i} \bf{\beta}}}{e^{\bf{X}_{i'} \bf{\beta}}} $$
For the estimated hazard ratio between persons $i$ and $i'$, we just plug in the coefficient estimates $b_{1}, \ldots, b_{p}$ for the $\beta_{1}, \ldots, \beta_{p}$, giving $e^{\bf{X}_{i} \bf{b}}$ and $e^{\bf{X}_{i'} \bf{b}}$.
As an example in R, I use the data from John Fox' appendix on the Cox-PH model which provides a very nice introductory text. First, we fetch the data and build a simple Cox-PH model for the time-to-arrest of released prisoners (fin
: factor - received financial aid with dummy coding "no"
-> 0, "yes"
-> 1, age
: age at the time of release, prio
: number of prior convictions):
> URL <- "https://socialsciences.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
> Rossi <- read.table(URL, header=TRUE) # our data
> Rossi[1:3, c("week", "arrest", "fin", "age", "prio")] # looks like this
week arrest fin age prio
1 20 1 no 27 3
2 17 1 no 18 8
3 25 1 no 19 13
> library(survival) # for coxph()
> fitCPH <- coxph(Surv(week, arrest) ~ fin + age + prio, data=Rossi) # Cox-PH model
> (coefCPH <- coef(fitCPH)) # estimated coefficients
finyes age prio
-0.34695446 -0.06710533 0.09689320
Now we plug in the sample averages for our predictors into the $e^{\bf{X} \bf{b}}$ formula:
meanFin <- mean(as.numeric(Rossi$fin) - 1) # average of financial aid dummy
meanAge <- mean(Rossi$age) # average age
meanPrio <- mean(Rossi$prio) # average number of prior convictions
rMean <- exp(coefCPH["finyes"]*meanFin # e^Xb
+ coefCPH["age"] *meanAge
+ coefCPH["prio"] *meanPrio)
Now we plug in the predictor values of the first 4 persons into the $e^{\bf{X} \bf{b}}$ formula.
r1234 <- exp(coefCPH["finyes"]*(as.numeric(Rossi[1:4, "fin"])-1)
+ coefCPH["age"] *Rossi[1:4, "age"]
+ coefCPH["prio"] *Rossi[1:4, "prio"])
Now calculate the relative risk for the first 4 persons against the sample average and compare to the output from predict.coxph()
.
> r1234 / rMean
[1] 1.0139038 3.0108488 4.5703176 0.7722002
> relRisk <- predict(fitCPH, Rossi, type="risk") # relative risk
> relRisk[1:4]
1 2 3 4
1.0139038 3.0108488 4.5703176 0.7722002
If you have a stratified model, the comparison in predict.coxph()
is against the strata-averages, this can be controlled via the reference
option that is explained in the help page.

- 132,789
- 81
- 357
- 650

- 11,549
- 49
- 63
-
7+1 because it is not obvious to get what predict.coxph exactly does from the help page! – ocram Dec 02 '12 at 12:29
-
1that was great! Very simple to understand! – user4673 Dec 02 '12 at 19:32
-
`meanFin – Zhubarb May 30 '14 at 14:53
-
1@Zhubarb `fin` is binary, so the numerical representation of the factor just has the values 1 and 2. Subtracting 1 gives us the dummy-coded variable with values 0 and 1 that also appears in the design matrix. Note that this won't work for factors with more than 2 levels. It is certainly debatable whether averaging dummy variables makes sense, but that's what `predict.coxph()` does. – caracal Jun 02 '14 at 07:03
-
In words, how would you interpret a hazard ratio of 3.01 (e.g. relRisk[2])? – RNB Sep 23 '16 at 10:34
-
@caracal Do you have a similar example for calculating the standard errors of the fitted value? When using a linear model, I can use the formula s.e=unname(rowSums((Xp V) * Xp)), where v=covariance matrix, Xp = covariates to predict for, to get the same standard errors given by predict function. When using the same formula for s.e of a fitted value from a cox model, they do not match up. I assume this is because predict function gives the standard error of the ratio which you highlight above. – AP30 Feb 14 '18 at 17:56
-
@AP30 I'm sorry not to be of help here. It might be worth studying the source of `survival::predict.coxph()` at [https://github.com/cran/survival/blob/master/R/predict.coxph.R](https://github.com/cran/survival/blob/master/R/predict.coxph.R) – caracal Feb 15 '18 at 09:39
-
Very clear answer. Thank you. I have a question about using the mean of a skewed continuous covariate for the reference average hazard calculation (by strata or overall sample). Wouldn't an extreme positive skew on, for example, age make any reference mean age a poor measure of central tendency? A median would be a better statistic unless skewed covariates can be transformed to normal? – Seanosapien Mar 03 '20 at 12:05