1

Is there any function in R that fit all statistical distributions, and choose best fit based on log-likelihood and Kolmogorov-Smirnov D (KSD) statistic?

Like a software named, EasyFit

Bioinformatics
  • 267
  • 2
  • 4
  • 13
  • 1
    See `?ecdf` (a joke, but only in part; I'm sure others will explain...) – Aaron left Stack Overflow Jul 14 '12 at 18:33
  • 2
    The `R` commands [`fitdistr`](http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/fitdistr.html) and [`ks.test`](http://stat.ethz.ch/R-manual/R-patched/library/stats/html/ks.test.html) do this for some distributions. Fitting *all* the distributions is 'pretty much' impossible. –  Jul 14 '12 at 18:35
  • Usually, I used EasyFit for this purpose because it do this task pretty well. However, now I am doing simulations so looking some commands in R. – Bioinformatics Jul 14 '12 at 18:37
  • 3
    I'm not sure what "all statistical distributions" is supposed to include. I'm not aware that there is a fixed, finite list of possible distributions. Certainly there is a relatively small number of distributions that account for >99% of all analyses that are actually run, but even then, this doesn't make sense to me. Typically, people think about which distributions you might consider by thinking about what the data stand for and how they were gathered. That is, most of that work is a priori. Testing against a distribution is usually just a check to see that it's not too unreasonable. – gung - Reinstate Monica Jul 14 '12 at 18:45
  • Actually, data is about process capability [CP=(USL-LSL)/(6*sigma)]. I want to see the distribution of CP by simulation under different sample size. – Bioinformatics Jul 14 '12 at 18:57
  • Process capability indices have very different distributions depending on the population distribution. For this reason judging capability by what is appropriate for the normal distribution can be misleading whne the actual population distribution has very short or very heavy tails. I discuss bootstrapping the CPk statistic in my bootstrap text and Kotz has written two books about capability indices. See the links below. http://www.amazon.com/Bootstrap-Methods-Practitioners-Researchers-Probability/dp/0471756210/ref=sr_1_2?s=books&ie=UTF8&qid=1342294274&sr=1-2&keywords=Michael+Chernick – Michael R. Chernick Jul 14 '12 at 19:32
  • For Kotz: http://www.amazon.com/Process-Capability-Indices-Samuel-Kotz/dp/041254380X/ref=sr_1_1?s=books&ie=UTF8&qid=1342294361&sr=1-1&keywords=process+capability+indices and http://www.amazon.com/Process-Capability-Indices-Theory-Practice/dp/0340691778/ref=sr_1_4?s=books&ie=UTF8&qid=1342294416&sr=1-4&keywords=process+capability+indices – Michael R. Chernick Jul 14 '12 at 19:34
  • I agree with Gung. Usually you test one or just a few tentatively selected distributions. – Michael R. Chernick Jul 14 '12 at 19:35
  • Thanks @Michael for books and valuable suggestion. I will generate data from Normal Distribution with various sample size (5,10,20,30,...,100) and then calculate Cp Index and will check maybe against 10 continuous distributions (that I am expecting). Is there any theoretical results/proof about distribution of Cp in the books that you mention? – Bioinformatics Jul 14 '12 at 19:40
  • Try `fitdistrplus`. Here's a previous answer to a similar question http://stats.stackexchange.com/questions/8662/need-help-identifying-a-distribution-by-its-histogram/8674#8674 – bill_080 Jul 14 '12 at 20:11

1 Answers1

-1

For a normal distribution Bill Heavlin has a method for constructing confidence intervals for Cpk. I discuss it in my book and compare it to my bootstrap confidence intervals for my example.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143