How to find initial values for Weibull MLE in R?

Question

I want to find parameter estimates using MLE for a weibull distribution to some data:

604 104 224 200 1444 1076 1308 6084 468 2308.

My code is as follows:

dweibull<-function(x,k,lambda){
+ k/lambda*(x/lambda)^(k-1)*exp(-(x/lambda)^k)
+ }

weibull.fit = fitdistr(x, densfun=dweibull, start=list(k=?,lambda=?))

My question is I don't know how to find initial values for k and lambda.

I really appreciate any help!

Glen_b · Answer 1 · 2021-11-13T10:45:25.057

Fun fact: the 0.632 quantile of the distribution is $\lambda$ (without regard to $k$), so one way to estimate $\lambda$ is the sample 0.632 quantile:

  wd <- scan()
  1: 604 104 224 200 1444 1076 1308 6084 468 2308.
  11: 
  Read 10 items
  lam0 <- quantile(wd,p=.632)
  lam0
   63.2% 
  1235.616

Another fun fact is the basis of the Weibull plot:

$\ln(-\ln(1- F(x))) = k\ln(x) -k\ln(\lambda)$

If we look at approximating that relationship by sample values:

$\ln(-\ln(1-\hat F(x)))= \hat{k}_0\ln(x) -\hat{k}_0\ln(\hat\lambda_0)$

That is a plot of $\ln(-\ln(1-\hat F(x)))$ vs $\ln(x)$ should look linear, and the slope should be a reasonable estimate of $k$. One could use regression to estimate that. However, when working with data, the right hand side is the random variable and the left fixed, so it's common to recast it so that $\ln(x)$ is regressed on $\ln(-\ln(1-\hat F(x)))$, in which case the reciprocal of the slope is an estimate of the shape parameter (conveniently, exponentiating the intercept is then an estimate of the scale).

For the values taken by $\hat F$ for such a plot it would be common to use $\frac{i-\alpha}{n+1-2\alpha}$ for $\hat F(x_{(i)})$ where $x_{(i)}$ is the $i$th smallest observation (just as for normal scores plots). The Wikipedia page suggests $\alpha=0.3$ but the default in R's ppoints should be fine:

    ws <- sort(wd)
    Fh <- ppoints(ws)
    k0 <- lm(log(-log(1-Fh))~log(ws))$coefficients[2]

(you can avoid the need to sort by using the same idea as in stats:::qqnorm.default -- it avoids explicitly rearranging the data by using ppoints(n)[order(order(y))] to get the theoretical quantiles in the same order as the data -- but I think the above is clearer)

You could alternatively try to get $\lambda$ out of the intercept from that straight line fit

These sorts of approaches are common and usually sufficient.

However, with your particular data there are several issues. The first is simply one of scale. If you rescale you can avoid the first problem:

fitdistr(w/10, densfun="weibull",start=list(shape=k0,scale=lam0/10))
      shape         scale   
    0.8912185   128.0091222 
 (  0.2104516) ( 47.7940515)

Note that here I am using the inbuilt weibull function rather than yours. It is a teeny bit more numerically stable (but not by much by the look of things).

Note that now the scale parameter is here one tenth of the actual scale.

So that looked like it converged, right? Well, firstly there were warnings and we had to scale it to get it to fit at all, so it's behaving pretty badly -- we're not at all sure it did converge. We may need to play with the convergence criteria.

If you instead rescale a bit smaller it will even produce an answer without a starting value, but we still get warnings so we're again not so sure it converged properly:

 fitdistr(w/100, densfun="weibull")
     shape        scale   
   0.8941211   12.9974798 
 ( 0.2109907) ( 4.8739202)

Fortunately, we can do better still -- the survreg function in survival (which comes with R but isn't loaded by default) can fit Weibull regression models, so that's a way to get parameter estimates.

The model is parameterized differently but that presents no problem, we just need to convert to the required parameterization:

 library(survival)
 ones <- array(1,length(w)) # a vector of 1's
 Sw <- Surv(w,ones)         # set up a survival object with
                            # all "times" in w uncensored
 sro <- survreg(Sw~1)$icoef # fit an intercept-only model 
                            #(i.e. the same distribution to every data point)
                            # ... no warnings are issued here
 lam0 <- exp(sro[1])        # pull out the parameters and 
 k0 <- exp(-sro[2])         # transform to our parameterization
 c(k0,lam0)                 # take a look at the values, but ignore the names
  Log(scale)  (Intercept) 
   0.8941177 1299.7480322 
 fitdistr(w/100, densfun="weibull",start=list(shape=k0,scale=lam0/100)) 
     shape        scale   
   0.8941177   12.9974803 
 ( 0.2109901) ( 4.8739411)

That converges to the same values (up to the scale factor), again without warnings. So now scaling the values back, we have:

     shape        scale   
   0.8941177   1299.74803 
 ( 0.2109901) ( 487.39411)

A bit of a roundabout way to do it. So it turns out that those values we had that we weren't sure had converged, essentially had.

Changing the options such as the convergence criteria that fitdistr supplies to optim may well solve the problem more easily than this, but I find survfit often gets there on the Weibull when fitdistr has trouble.

Sure, scaling things to be on the order (of magnitude of) 1 is a good idea when performing optimization, but man, is fitdsitr, with the default options (which I believe includes use of Nelder-Mead, which should have been put out to pasture 50 years ago) pathetic. Guys, this is not a very difficult problem. I'm curious as to how fitdistr would fare using a different optimization algorithm choice. — Mark L. Stone, Aug 21 '16 at 14:41
Can you try the L-BFGS-B option? Normally, BFGS should be "better" than L-BFGS, and there is no reason for the "L" on a tiny (dimension 2) problem such as this, but it might be that L-BFGS-B is better, more robustly written than the BFGS in optim. BFGS by itself doesn't tell you anything about use of line search, trust regions, or other aspects of the implementation. How many iterations was BFGS performing on the unscaled data? On the 1/10 scaled data? — Mark L. Stone, Aug 21 '16 at 17:22
@MarkL.Stone The issue may not be with fitdistr, but rather that the handwritten dweibull function is probably considerably more numerical unstable than R's dweibull. Also I'm not sure if fitdistr is expecting to take in a density function, or a log density function? — Cliff AB, Aug 21 '16 at 17:38
@Cliff AB So far we have no reports of results from the OP. I was referring to the results by Glen_b using densfun="weibull" , which uses fitdistr's built-in way of handling Weibull. It beats me why theoretical restrictions (must be positive, with some default tolerance on min. value) on Weibull parameters are not automatically put into a call to the optimizer, so must be capable of handling bound constraints. What's the most common default starting values used in nonlinear optimizers, it's the vector of zeros, not so good here.fitdistr avoids this by forcing starting values to be provided. — Mark L. Stone, Aug 21 '16 at 18:01
@Mark but we see that `fitdistr` doesn't force starting values for many of its known distributions -- one of my examples above (`fitdistr(w/100, densfun="weibull")`) shows me calling it without. — Glen_b, Aug 21 '16 at 23:43
@Glen_b Thanks for correcting me on that. I misread/.misremembered between the documentation and your post and was thinking starting values were required for "weibull". — Mark L. Stone, Aug 22 '16 at 02:13

How to find initial values for Weibull MLE in R?

1 Answers1

Linked