51

I am doing a multivariate Cox regression, I have my significant independent variables and beta values. The model fits to my data very well.

Now, I would like to use my model and predict the survival of a new observation. I am unclear how to do this with a Cox model. In a linear or logistic regression, it would be easy, just put the values of new observation in the regression and multiply them with betas and so I have the prediction of my outcome.

How can I determine my baseline hazard? I need it in addition to computing the prediction.

How is this done in a Cox model?

BGreene
  • 3,045
  • 4
  • 16
  • 33
Marja
  • 513
  • 1
  • 5
  • 4

5 Answers5

41

Following Cox model, the estimated hazard for individual $i$ with covariate vector $x_i$ has the form $$\hat{h}_i(t) = \hat{h}_0(t) \exp(x_i' \hat{\beta}),$$ where $\hat{\beta}$ is found by maximising the partial likelihood, while $\hat{h}_0$ follows from the Nelson-Aalen estimator, $$ \hat{h}_0(t_i) = \frac{d_i}{\sum_{j:t_j \geq t_i} \exp(x_j' \hat{\beta})} $$ with $t_1$, $t_2, \dotsc$ the distinct event times and $d_i$ the number of deaths at $t_i$ (see, e.g., Section 3.6).

Similarly, $$\hat{S}_i(t) = \hat{S}_0(t)^{\exp(x_i' \hat{\beta})}$$ with $\hat{S}_0(t) = \exp(- \hat{\Lambda}_0(t))$ and $$\hat{\Lambda}_0(t) = \sum_{j:t_j \leq t} \hat{h}_0(t_j).$$

EDIT: This might also be of interest :-)

ocram
  • 19,898
  • 5
  • 76
  • 77
  • Maybe I should ask my question in another way. What I am looking for is the way to predict an outcome in Cox model. For my survival analysis, I used a Cox proportional hazard approach. I got a list of sig. factors for my outcome and now, I would like to use my model to predict the survival of a new observation. Do I need estimation of baseline hazard? How can I predict my survival? – Marja Sep 11 '12 at 07:14
  • Survival analysis is not commonly used to predict future times to an event. There could be a variety of ways to do this say by applying median survival time or mean survival time to take two examples. But the properties of the survival curve that you need would require an estimated survival curve which for the Cox model would require specification of the baseline hazard function (that is not provided in the Cox approach). – Michael R. Chernick Sep 11 '12 at 10:58
  • @Marjan by saying that you got a list of "significant" factors you imply that you used variable selection that may result in unreliable predictions. Bootstrap validation, repeating all variable selection steps for, say, 300 resamples, may be worth doing. – Frank Harrell Sep 11 '12 at 11:29
  • 2
    That is exactly my question... I need an estimation of baseline hazard function to be able to have the prediction, correct? Do you know any method for estimating it? – Marja Sep 11 '12 at 11:32
  • Doesn't the formula given is this post answer your question? – ocram Sep 11 '12 at 12:18
  • @Frank, great point. Actually, I have done this validation using Jackknife method, now I want to find out how suitable and reliable my jackknifes results for the deleted observations are. My idea is, that I can not just simply put (like in linear or logistic) the values of omitted observation in the jackknife reg and got the outcome. So, I need a method to do this prediction. – Marja Sep 12 '12 at 07:44
  • @ocram, I dont know :( Could you tell me in my case, can I use this estimation? Can I estimate the baseline hazard, compute the predicted outcome and multiply it with the estimation of baseline hazarad. Is it the correct predition value? – Marja Sep 12 '12 at 07:48
  • 2
    @Marjan the jackknife may not properly reflect uncertainty caused by variable selection. The bootstrap properly shows more variability in which variables are labeled "significant". If you want to do a "relative validation" you can show that predictive discrimination is good after correcting for overfitting. This does not require dealing with the baseline hazard, but is validating relative log hazard estimates. The `validate` function in the R `rms` package in conjunction with the `cph` function will do that. The only stepwise algorithm implemented in `validate` is backwards stepdown. – Frank Harrell Sep 12 '12 at 12:44
  • @ocram, it sounds good. I will buy it... hopefully helps :) – Marja Sep 12 '12 at 13:18
  • @Frank, I will try to validate my modell using different methods, I have also checked it with bootstrapping. My main question is a way to predict the outcomes in cox reg. Have you any idea? I work mainly with SAS. Do you know some helpful procedures in SAS? – Marja Sep 12 '12 at 13:21
  • 1
    Getting predicted relative hazards (i.e., the linear predictor) is quite simple. But I quit using SAS in 1991. – Frank Harrell Sep 13 '12 at 17:57
  • 9
    The link has gone dead :-(. – gung - Reinstate Monica Oct 13 '14 at 03:02
  • 3
    Is there a way to predict the survival Time T for a specific individual? I mean that given a list of values for the covariates, what is the way to find out the time after which the individual is most likely to die? – statBeginner Feb 12 '15 at 21:31
  • Could you comment on how this generalizes to a counting process (i.e. more than one events possible for each individual). Does one use the same formula but with somewhat different meaning of the quantities involved? – Roger Vadim Oct 23 '20 at 09:15
15

The function predictSurvProb in the pec package can give you absolute risk estimates for new data based on an existing cox model if you use R.

The mathematical details I cannot explain.

EDIT: The function provides survival probabilities, which I have so far taken as 1-(Event probability).

EDIT 2:

One can do without the pec package. Using only the survival package, the following function returns absolute risk based on a Cox model

risk = function(model, newdata, time) {
  as.numeric(1-summary(survfit(model, newdata = newdata, se.fit = F, conf.int = F), times = time)$surv)
}
miura
  • 3,364
  • 3
  • 21
  • 27
  • 1-Survival probability is the cumulative hazard. I think the OP requests the instantaneous hazard function (of the baseline) or some kind of smoothed estimate of it (`muhaz` packages in R). – ECII Mar 09 '13 at 21:32
  • 1
    1-Survival probability is not the cumulative hazard. In the absence of competing risks the two are connected as detailed on https://en.wikipedia.org/wiki/Survival_analysis#Hazard_function_and_cumulative_hazard_function. – miura Jul 08 '15 at 12:34
  • 1-Survival probability = Failure rate (assuming only 1x method of failure). The relationship of Survival probability to cumulative hazard is outlined in the accepted answer: `S(t)=exp(−Λ(t))` where `Λ(t)` is the cumulative hazard. – NickBraunagel Sep 18 '19 at 18:16
15

Maybe you would also like to try something like this? Fit a Cox proportional hazards model and use it to get the predicted Survival curve for a new instance.

Taken out of the help file for the survfit.coxph in R (I just added the lines part)

# fit a Cox proportional hazards model and plot the  
# predicted survival for a 60 year old 
fit <- coxph(Surv(futime, fustat) ~ age, data=ovarian) 
plot(survfit(fit, newdata=data.frame(age=60)),
     xscale=365.25, xlab="Years", ylab="Survival", conf.int=F) 
# also plot the predicted survival for a 70 year old
lines(survfit(fit, newdata=data.frame(age=70)),
     xscale=365.25, xlab="Years", ylab="Survival") 

You should keep in mind though that for the proportional hazards assumption to still hold for your prediction, the patient for which you predict should be from a group that is qualitatively the same as the one used to derive the Cox proportional hazards model you used for the prediction.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Slak
  • 306
  • 2
  • 5
6

The basehaz function of survival packages provides the baseline hazard at the event time points. From that you can work your way up the math that ocram provides and include the ORs of your coxph estimates.

ECII
  • 1,791
  • 2
  • 17
  • 25
2

The whole point of the Cox model is the proportional hazard's assumption and the use of the partial likelhood. The partial likelihood has the baseline hazard function eliminated. So you do not need to specify one. That is the beauty of it!

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • 3
    If you however want to get an estimate of the hazard or the survival for a particular value of the covariate vector, then you do need an estimate of the baseline hazard or survival. The Nelson-Aalen estimate usually makes the job... – ocram Sep 10 '12 at 15:13
  • 1
    Often with the Cox model you are comparing two survival functions and the key is the hazard ratio rather than the hazard function. The baseline hazard is like a nuisance parameter that Cox so cleverly eliminated from the problem using the proportional hazards assumption. Whatever method you would like to use for estimating the hazard function and/or the baseline hazard in the context of the model would require using the Cox form of the model which forces proportionality. – Michael R. Chernick Sep 10 '12 at 15:30
  • Thank you so much, It would be great if you see my comment on the answer of ocram. Maybe you could help me too? – Marja Sep 11 '12 at 07:17
  • 3
    You can also stratify on factors that are not in proportional hazards. But at any rate the Cox model and its after-the-fit estimator of the baseline hazard can be used to get predicted quantiles of survival time, various survival probabilities, and predicted mean survival time if you have long-term follow-up. All these quantities are easy to get in the R package `rms`. – Frank Harrell Sep 11 '12 at 11:31
  • You don't need to specify it, but it is estimated. – DWin Oct 31 '18 at 21:55