Fitting all statistical distributions in R and choose best fit

Question

Is there any function in R that fit all statistical distributions, and choose best fit based on log-likelihood and Kolmogorov-Smirnov D (KSD) statistic?

Like a software named, EasyFit

See `?ecdf` (a joke, but only in part; I'm sure others will explain...) — Aaron left Stack Overflow, Jul 14 '12 at 18:33
The `R` commands [`fitdistr`](http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/fitdistr.html) and [`ks.test`](http://stat.ethz.ch/R-manual/R-patched/library/stats/html/ks.test.html) do this for some distributions. Fitting *all* the distributions is 'pretty much' impossible. — , Jul 14 '12 at 18:35
Usually, I used EasyFit for this purpose because it do this task pretty well. However, now I am doing simulations so looking some commands in R. — Bioinformatics, Jul 14 '12 at 18:37
I'm not sure what "all statistical distributions" is supposed to include. I'm not aware that there is a fixed, finite list of possible distributions. Certainly there is a relatively small number of distributions that account for >99% of all analyses that are actually run, but even then, this doesn't make sense to me. Typically, people think about which distributions you might consider by thinking about what the data stand for and how they were gathered. That is, most of that work is a priori. Testing against a distribution is usually just a check to see that it's not too unreasonable. — gung - Reinstate Monica, Jul 14 '12 at 18:45
Actually, data is about process capability [CP=(USL-LSL)/(6*sigma)]. I want to see the distribution of CP by simulation under different sample size. — Bioinformatics, Jul 14 '12 at 18:57
Process capability indices have very different distributions depending on the population distribution. For this reason judging capability by what is appropriate for the normal distribution can be misleading whne the actual population distribution has very short or very heavy tails. I discuss bootstrapping the CPk statistic in my bootstrap text and Kotz has written two books about capability indices. See the links below. http://www.amazon.com/Bootstrap-Methods-Practitioners-Researchers-Probability/dp/0471756210/ref=sr_1_2?s=books&ie=UTF8&qid=1342294274&sr=1-2&keywords=Michael+Chernick — Michael R. Chernick, Jul 14 '12 at 19:32
For Kotz: http://www.amazon.com/Process-Capability-Indices-Samuel-Kotz/dp/041254380X/ref=sr_1_1?s=books&ie=UTF8&qid=1342294361&sr=1-1&keywords=process+capability+indices and http://www.amazon.com/Process-Capability-Indices-Theory-Practice/dp/0340691778/ref=sr_1_4?s=books&ie=UTF8&qid=1342294416&sr=1-4&keywords=process+capability+indices — Michael R. Chernick, Jul 14 '12 at 19:34
I agree with Gung. Usually you test one or just a few tentatively selected distributions. — Michael R. Chernick, Jul 14 '12 at 19:35
Thanks @Michael for books and valuable suggestion. I will generate data from Normal Distribution with various sample size (5,10,20,30,...,100) and then calculate Cp Index and will check maybe against 10 continuous distributions (that I am expecting). Is there any theoretical results/proof about distribution of Cp in the books that you mention? — Bioinformatics, Jul 14 '12 at 19:40
Try `fitdistrplus`. Here's a previous answer to a similar question http://stats.stackexchange.com/questions/8662/need-help-identifying-a-distribution-by-its-histogram/8674#8674 — bill_080, Jul 14 '12 at 20:11

score -1 · Accepted Answer · answered Jul 14 '12 at 19:47

-1

For a normal distribution Bill Heavlin has a method for constructing confidence intervals for Cpk. I discuss it in my book and compare it to my bootstrap confidence intervals for my example.

answered Jul 14 '12 at 19:47

Michael R. Chernick

39,640
28
74
143

This is an answer to a secondary question by the OP. Why isn't it appropriate as an asnwer? – Michael R. Chernick Jul 14 '12 at 20:24
Is there any reference of theoretical results/proof about distribution of Cp? – Bioinformatics Jul 15 '12 at 06:42
I am sure that whatever was published at the time of the Johnson and Kotz book would be covered in there. – Michael R. Chernick Jul 15 '12 at 12:43

Fitting all statistical distributions in R and choose best fit

1 Answers1