4

I went through a tutorial on survival analysis and learned to compute the median from a weibull:

wb <- survreg(Surv(interval, censored) ~ 1, data = .x, dist = 'weibull')
 wb
Call:
survreg(formula = Surv(interval, censored) ~ 1, data = pdata)

Coefficients:
(Intercept) 
   1.692154 

Scale= 0.8255916 

Loglik(model)= -220389.6   Loglik(intercept only)= -220389.6
n= 99261 

I can compute the median with:

predict(wb, type = "quantile", p = 0.5, newdata = data.frame(1))
       1 
4.013099 

Is this the convention? If the context is churn in marketing or product analytics and the time unit used was months, could I say that the expected lifetime of a new subscription is ~4 months?

I was reading this post: Attempting to find mean of Weibull function in R

In that post, the OP asks about calculating expected value.The accepted answer refers to a formula that use shape and scale, but my wb variable has only a Scale parameter, not a shape one.

Can I compute expected value in a similar manner? How? What's the convention? Just the median? How can I report expected months duration of a new customer or subscription using my wb weibull model?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Doug Fir
  • 1,218
  • 1
  • 13
  • 27

1 Answers1

3

This is a great question because you ask about "convention" and not right/wrong. So first of all with respect to convention there's usually two ways of composing a survival time from a survival distribution: mean and median. The problem with the median route is that very often a survival probability prediction never actually goes below 0.5, so median is misleading (though a similar but different problem exists for mean). In your example median makes sense as it's a continuous parametric distribution.

Note that mathematically the expected lifetime is not the median, it's the mean. So be careful how you report that. The median is usually reported as it's a quantity that clinicians (for example) can understand: e.g. "50% of people with the same profile should die by this time".

Now about calculating the mean from your distribution. survreg uses the slightly confusing framework introduced by "The statistical analysis of failure time data" (Kalbfleisch, Prentice). This means that you have to know the conversion from the survreg output to the 'standard' representation for each of the possible distributions, and they don't tell you in the documentation, which is very annoying. An R package I maintain does the hard work for you but it's overly-complicated for this example. However copying the relevant parts from the package:

Below is a table with the distribution from survreg and the corresponding parameters in a standard parametrisation. For the below, scale = wb$scale and location = as.numeric(wb$coefficients[1]).

Distribution Parameters
Weibull(shape, scale) Shape = 1/scale. Scale = exp(location)
Exponential(scale) Scale = exp(location)
Lognormal(logmean,logsd) Logmean = location, Logsd = scale
Loglogistic(scale,shape) Scale = exp(location), Shape = 1/scale

Now you can just substitute these parameters to get the results you want (e.g. look on wikipedia for analytical formulas, or even use one of my packages {distr6})

Edit: Just want to add some clarity to this answer. Substituting the above table into a distribution will give you the expectation of the baseline distribution not a predicted distribution for an individual. This should work in your case as you fit a model with no covariates but not in general.

Edit2: Reproducible example:

library(survival)
fit = survreg(Surv(time, status) ~ 1, data = rats)
scale = fit$scale
location = as.numeric(fit$coefficients[1])

shape = 1/scale
scale = exp(location)

scale * gamma(1 + 1/shape)
# (yes plugging my own package here but just to demonstrate result is same!)
distr6::Weibull$new(shape = shape, scale = scale)$mean()
```
RaphaelS
  • 181
  • 5
  • This is great, thank you! May I ask, while playing around yesterday I tried `predict(wb, newdata=data.frame(1))` which returned 8.096338 . I noticed this is also the same as `exp(intercept)`, `exp(2.091412)`. Is 8.09 the 'expected' value? How can I interpret that? – Doug Fir Dec 21 '20 at 21:03
  • Meh, actually the example I gave in my comment refers to a different WB model and I currently cannot recreate the one I had when I posted. Nevertheless, it seems to hold that when predicting with `newdata= data.frame(1)` I get the exp() of the intercept returned... – Doug Fir Dec 21 '20 at 21:07
  • 1
    So. Firstly `exp(intercept)` is not the expected value and the expected value will depend on the distribution you choose. In general `exp(intercept)` relates to the scale parameter, which means that if your Weibull has a shape of 1, then it is equivalent to Exponential in which case the expected value is `exp(intercept)`, but only in that specific case. – RaphaelS Dec 21 '20 at 21:09
  • Just for absolute clarity, here's what I did and what I got: `scale – Doug Fir Dec 21 '20 at 21:53
  • 1
    I've added a reprex to my answer, which will be easier to read than in the comments – RaphaelS Dec 21 '20 at 22:08
  • 1
    Thank you very much! Feeling better about this now – Doug Fir Dec 21 '20 at 22:25