19

Hi can the same be shown to obtain shape and scale parameter for modified maximum likelihood method

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Zay
  • 307
  • 1
  • 3
  • 8
  • 2
    Hi @zaynah and welcome to the site. I'm not sure if your question is *if* your data are compatible with a weibull distribution or what the parameters of a weibull distribution describing your data would be. If you assume that your data follow a weibull distribution and want to find the parameters, you can use `fitdistr(mydata, densfun="weibull")` in `R` to find the parameters via MLE. To make a graph, use the `qqPlot` function from the `car` package: `qqPlot(mydata, distribution="weibull", shape=, scale=)` with the shape and scale parameters that you've found with `fitdistr`. – COOLSerdash May 31 '13 at 11:14
  • hi thanks for a quick reply, my data is mean monthly wind speed for 5 years, it is compatible with weibull. problem is i do not know how to find k and c ie the parameters of weibull .. and i do not know how to compare experimental data with weibull... also what is MLE...:( – Zay May 31 '13 at 11:25
  • MLE = Maximum likelihood estimation. I don't know what software you use, but in `R`, which is freely available, you can install and load the package `MASS` and use `fitdistr` with your data to calculate the estimates of k and c. And then, you can compare your data with the weibull with the estimated parameters using `qqPlot` from the `car` package. – COOLSerdash May 31 '13 at 11:28
  • thanks a lot COOlserdash, i am downloading the R software. – Zay May 31 '13 at 11:31
  • 1
    Ok, here is a step by step tutorial: 1. Download and install R. 2. Install the packages `MASS` and `car` by typing: `install.packages(c("MASS", "car"))`. Load the packages by typing: `library(MASS)` and `library(car)`. 3. [Import your data into `R`](http://www.statmethods.net/input/importingdata.html), preferably with a .txt-file. 4. If your data is called `my.data` use `fitdistr` in the following way: `fitdistr(my.data, distribution="weibull")`. 5. Make a graph as I described in the first comment with `qqPlot`. – COOLSerdash May 31 '13 at 11:34
  • am having error when importing my data.. the data are in the form of first row is the months name.. then first column is the years name.. then filled with numbers... – Zay May 31 '13 at 11:47
  • You don't necessarily need the months and years, just the numbers. What's the error? Maybe you should do a google search for how to import the data into `R`. – COOLSerdash May 31 '13 at 11:50
  • but with just numbers, i wil have only 1 value of k and 1 of c right? yeah am googling it. – Zay May 31 '13 at 11:52
  • Yes, with just the numbers you'll only get 1 k and 1 c. – COOLSerdash May 31 '13 at 11:53
  • # first row contains variable names # assign the variable id to row names # note the / instead of \ on mswindows systems mydata – Zay May 31 '13 at 12:07
  • I think you don't need `row.names="id"`. – COOLSerdash May 31 '13 at 12:08
  • Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'desktop:/mydata.txt': Invalid argument....( this is the error after i del the row.names) – Zay May 31 '13 at 12:10
  • Well, "desktop" is not a path, is it. A system path must be something like "C:/.../mydata.txt". I'm gonna stop giving you advice here in the comments, as they are too long already :) You can modify/extend your question after you managed to import your data. There are numerous tutorials on how to import data into `R`. – COOLSerdash May 31 '13 at 12:12
  • @COOLSerdash you seem to have a reasonably extensive answer here, spread across many comments. Maybe it should be put together into an answer. – Glen_b May 31 '13 at 13:37
  • Some theory and limitations of the weibull are discussed here http://arxiv.org/abs/1211.3853 and – Abe Jun 01 '13 at 03:37
  • I wrote an expository paper on the Weibull distribution for wind speed and wind power estimation as a course final project. One of the topics I focused on was computing the MLEs for the Weibull parameters. You can find a copy on my blog [here](http://matthewhr.files.wordpress.com/2012/09/mathematical-methods-in-wind-power-modeling.pdf). –  Jul 20 '13 at 16:47

1 Answers1

28

Because @zaynah posted in the comments that the data are thought to follow a Weibull distribution, I'm gonna provide a short tutorial on how to estimate the parameters of such a distribution using MLE (Maximum likelihood estimation). There is a similar post about wind speeds and Weibull distribution on the site.

  1. Download and install R, it's free
  2. Optional: Download and install RStudio, which is a great IDE for R providing a ton of useful functions such as syntax highlighting and more.
  3. Install the packages MASS and car by typing: install.packages(c("MASS", "car")). Load them by typing: library(MASS) and library(car).
  4. Import your data into R. If you have your data in Excel, for example, save them as delimited text file (.txt) and import them in R with read.table.
  5. Use the function fitdistr to calculate the maximum likelihood estimates of your weibull distribution: fitdistr(my.data, densfun="weibull", lower = 0). To see a fully worked out example, see the link at the bottom of the answer.
  6. Make a QQ-Plot to compare your data with a Weibull distribution with the scale and shape parameters estimated at point 5: qqPlot(my.data, distribution="weibull", shape=, scale=)

The tutorial of Vito Ricci on fitting distribution with R is a good starting point on the matter. And there are numerous posts on this site on the subject (see this post too).

To see a fully worked out example of how to use fitdistr, have a look at this post.

Let's look at an example in R:

# Load packages

library(MASS)
library(car)

# First, we generate 1000 random numbers from a Weibull distribution with
# scale = 1 and shape = 1.5

rw <- rweibull(1000, scale=1, shape=1.5)

# We can calculate a kernel density estimation to inspect the distribution
# Because the Weibull distribution has support [0,+Infinity), we are truncate
# the density at 0

par(bg="white", las=1, cex=1.1)
plot(density(rw, bw=0.5, cut=0), las=1, lwd=2,
xlim=c(0,5),col="steelblue")

Weibull KDE

# Now, we can use fitdistr to calculate the parameters by MLE
# The option "lower = 0" is added because the parameters of the Weibull distribution need to be >= 0

fitdistr(rw, densfun="weibull", lower = 0)

     shape        scale   
  1.56788999   1.01431852 
 (0.03891863) (0.02153039)

The maximum likelihood estimates are close to those we arbitrarily set in the generation of the random numbers. Let's compare our data using a QQ-Plot with a hypothetical Weibull distribution with the parameters that we've estimated with fitdistr:

qqPlot(rw, distribution="weibull", scale=1.014, shape=1.568, las=1, pch=19)

QQPlot

The points are nicely aligned on the line and mostly within the 95%-confidence envelope. We would conclude that our data are compatible with a Weibull distribution. This was expected, of course, as we've sampled our values from a Weibull distribution.


Estimating the $k$ (shape) and $c$ (scale) of a Weibull distribution without MLE

This paper lists five methods to estimate the parameters of a Weibull distribution for wind speeds. I'm gonna explain three of them here.

From means and standard deviation

The shape parameter $k$ is estimated as: $$ k=\left(\frac{\hat{\sigma}}{\hat{v}}\right)^{-1.086} $$ and the scale parameter $c$ is estimated as: $$ c=\frac{\hat{v}}{\Gamma(1+1/k)} $$ with $\hat{v}$ is the mean wind speed and $\hat{\sigma}$ the standard deviation and $\Gamma$ is the Gamma function.

Least-squares fit to observed distribution

If the observed wind speeds are divided into $n$ speed interval $0-V_{1},V_{1}-V_{2},\ldots, V_{n-1}-V_{n}$, having frequencies of occurrence $f_{1}, f_{2},\ldots,f_{n}$ and cumulative frequencies $p_{1}=f_{1}, p_{2}=f_{1}+f_{2}, \ldots, p_{n}=p_{n-1}+f_{n}$, then you can fit a linear regression of the form $y=a+bx$ to the values $$ x_{i} = \ln(V_{i}) $$ $$ y_{i} = \ln[-\ln(1-p_{i})] $$ The Weibull parameters are related to the linear coefficients $a$ and $b$ by $$ c=\exp\left(-\frac{a}{b}\right) $$ $$ k=b $$

Median and quartile wind speeds

If you don't have the complete observed wind speeds but the median $V_{m}$ and quartiles $V_{0.25}$ and $V_{0.75}$ $\left[p(V\leq V_{0.25})=0.25, p(V\leq V_{0.75})=0.75\right]$, then $c$ and $k$ can be computed by the relations $$ k = \ln\left[\ln(0.25)/\ln(0.75)\right]/\ln(V_{0.75}/V_{0.25})\approx 1.573/\ln(V_{0.75}/V_{0.25}) $$ $$ c=V_{m}/\ln(2)^{1/k} $$

Comparison of the four methods

Here is an example in R comparing the four methods:

library(MASS)  # for "fitdistr"

set.seed(123)
#-----------------------------------------------------------------------------
# Generate 10000 random numbers from a Weibull distribution
# with shape = 1.5 and scale = 1
#-----------------------------------------------------------------------------

rw <- rweibull(10000, shape=1.5, scale=1)

#-----------------------------------------------------------------------------
# 1. Estimate k and c by MLE
#-----------------------------------------------------------------------------

fitdistr(rw, densfun="weibull", lower = 0)
shape         scale   
1.515380298   1.005562356 

#-----------------------------------------------------------------------------
# 2. Estimate k and c using the leas square fit
#-----------------------------------------------------------------------------

n <- 100 # number of bins
breaks <- seq(0, max(rw), length.out=n)

freqs <- as.vector(prop.table(table(cut(rw, breaks = breaks))))
cum.freqs <- c(0, cumsum(freqs)) 

xi <- log(breaks)
yi <- log(-log(1-cum.freqs))

# Fit the linear regression
least.squares <- lm(yi[is.finite(yi) & is.finite(xi)]~xi[is.finite(yi) & is.finite(xi)])
lin.mod.coef <- coefficients(least.squares)

k <- lin.mod.coef[2]
k
1.515115
c <- exp(-lin.mod.coef[1]/lin.mod.coef[2])
c
1.006004

#-----------------------------------------------------------------------------
# 3. Estimate k and c using the median and quartiles
#-----------------------------------------------------------------------------

med <- median(rw)
quarts <- quantile(rw, c(0.25, 0.75))

k <- log(log(0.25)/log(0.75))/log(quarts[2]/quarts[1])
k
1.537766
c <- med/log(2)^(1/k)
c
1.004434

#-----------------------------------------------------------------------------
# 4. Estimate k and c using mean and standard deviation.
#-----------------------------------------------------------------------------

k <- (sd(rw)/mean(rw))^(-1.086)
c <- mean(rw)/(gamma(1+1/k))
k
1.535481
c
1.006938

All methods yield very similar results. The maximum likelihood approach has the advantage that the standard errors of the Weibull parameters are directly given.


Using bootstrap to add pointwise confidence intervals to the PDF or CDF

We can use a the non-parametric bootstrap to construct pointwise confidence intervals around the PDF and CDF of the estimated Weibull distribution. Here's an R script:

#-----------------------------------------------------------------------------
# 5. Bootstrapping the pointwise confidence intervals
#-----------------------------------------------------------------------------

set.seed(123)

rw.small <- rweibull(100,shape=1.5, scale=1)

xs <- seq(0, 5, len=500)


boot.pdf <- sapply(1:1000, function(i) {
  xi <- sample(rw.small, size=length(rw.small), replace=TRUE)
  MLE.est <- suppressWarnings(fitdistr(xi, densfun="weibull", lower = 0))  
  dweibull(xs, shape=as.numeric(MLE.est[[1]][13]), scale=as.numeric(MLE.est[[1]][14]))
}
)

boot.cdf <- sapply(1:1000, function(i) {
  xi <- sample(rw.small, size=length(rw.small), replace=TRUE)
  MLE.est <- suppressWarnings(fitdistr(xi, densfun="weibull", lower = 0))  
  pweibull(xs, shape=as.numeric(MLE.est[[1]][15]), scale=as.numeric(MLE.est[[1]][16]))
}
)   

#-----------------------------------------------------------------------------
# Plot PDF
#-----------------------------------------------------------------------------

par(bg="white", las=1, cex=1.2)
plot(xs, boot.pdf[, 1], type="l", col=rgb(.6, .6, .6, .1), ylim=range(boot.pdf),
     xlab="x", ylab="Probability density")
for(i in 2:ncol(boot.pdf)) lines(xs, boot.pdf[, i], col=rgb(.6, .6, .6, .1))

# Add pointwise confidence bands

quants <- apply(boot.pdf, 1, quantile, c(0.025, 0.5, 0.975))
min.point <- apply(boot.pdf, 1, min, na.rm=TRUE)
max.point <- apply(boot.pdf, 1, max, na.rm=TRUE)
lines(xs, quants[1, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[3, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[2, ], col="darkred", lwd=2)
#lines(xs, min.point, col="purple")
#lines(xs, max.point, col="purple")

Weibull PDF CIs

#-----------------------------------------------------------------------------
# Plot CDF
#-----------------------------------------------------------------------------

par(bg="white", las=1, cex=1.2)
plot(xs, boot.cdf[, 1], type="l", col=rgb(.6, .6, .6, .1), ylim=range(boot.cdf),
     xlab="x", ylab="F(x)")
for(i in 2:ncol(boot.cdf)) lines(xs, boot.cdf[, i], col=rgb(.6, .6, .6, .1))

# Add pointwise confidence bands

quants <- apply(boot.cdf, 1, quantile, c(0.025, 0.5, 0.975))
min.point <- apply(boot.cdf, 1, min, na.rm=TRUE)
max.point <- apply(boot.cdf, 1, max, na.rm=TRUE)
lines(xs, quants[1, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[3, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[2, ], col="darkred", lwd=2)
lines(xs, min.point, col="purple")
lines(xs, max.point, col="purple")

Weibull CDF CIs

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
  • +1, nice overview. NB, a slight shortcut might be to use [?qqPlot](http://rss.acs.unt.edu/Rdoc/library/car/html/qq.plot.html) w/ `distribution=weibull` from the car package, which will fit the paramaters via MLE & make the qq-plot in 1 step. – gung - Reinstate Monica May 31 '13 at 17:19
  • @gung Thanks. I'm not aware that qqPlot from `car` calculates the MLE parameters automatically. If I generate a random variable with a weibull distribution (`rweibull`) and use the command `qqPlot(rw, distribution="weibull")` I get an error message saying that must provide the parameters `shape` and `scale` to `qqPlot`. Am I missing something? – COOLSerdash May 31 '13 at 20:48
  • my mistake. Evidently, it only automatically estimates parameters from some distributions, and Weibull isn't one of those. – gung - Reinstate Monica May 31 '13 at 21:39
  • hi, i found that after i import mydata into R, when i do the command,fitdistr(mydata, densfun="weibull") it says error messgae that "mydata" not found.. in fact my data hs been imported into R. any answer would be welcome. – Zay Jun 01 '13 at 05:52
  • @zaynah Could you please edit your answer and post your code that you use to import the data. Please add the error message too. Could you import the data without errors? Did you check if the data was imported correctly? – COOLSerdash Jun 01 '13 at 06:13
  • i import successfully my data..:read.table("C:/Users/Zaynah/Desktop/mydata.csv"), then i got my data. i mean i saw the full data.. then i use the function > fitdistr(mydata, distribution="weibull") then the error was found:"Error in fitdistr(mydata, distribution = "weibull") : object 'mydata' not found" – Zay Jun 01 '13 at 06:37
  • @zaynah That's because `mydata` refers to the whole dataset. What is your data inside the dataset called? If it is called `wind.speed`, for example, you have to type `mydata$wind.speed` to access the variable inside your dataset. Just replace `wind.speed` with your actual name. – COOLSerdash Jun 01 '13 at 06:39
  • but @COOLSerdash there is no name at all.. like you told me, i had only insert the numbers.. so mydata has only rows n columns of numbers..:( – Zay Jun 01 '13 at 06:43
  • @zaynah Please post your R-output in your question (click the edit button below your question). In any case, `mydata` should have exactly 1 column and n rows (n = your sample size). This column has a name. Type `str(mydata)` to get it. Use this name then in this way: `fitdistr(mydata$myname, distribution="weibull")`. – COOLSerdash Jun 01 '13 at 06:46
  • read.table("C:/Users/Zaynah/Desktop/mydata.csv") V1 1 5.28,5.28,4.75,3.69,3.69,5.28,5.81,4.22,5.28,5.28,4.22,3.69 2 2.64,4.75,3.69,4.22,3.69,3.69,5.28,5.28,4.75,4.22,4.22,4.22 3 3.17,3.69,3.69,3.69,4.75,4.75,5.28,5.81,4.22,4.75,4.22,4.22 4 4.22,3.69,3.17,4.22,2.64,3.69,4.22,4.75,4.75,4.22,4.22,3.69 5 3.69,3.69,5.28,4.75,4.22,5.23,5.12,5.81,5.81,5.81,4.43,4.43 ps: the edit button is not working...when i do "str(mydata)" Error in str(mydata) : object 'mydata' not found – Zay Jun 01 '13 at 06:51
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/9037/discussion-between-coolserdash-and-zaynah) – COOLSerdash Jun 01 '13 at 06:51
  • @AndreSilva Andre, first of all, thank you very much for the bounty! I was surprised to see that someone put a bounty on this question. Thanks again! – COOLSerdash Jul 20 '13 at 19:06