0

I have a set of data points that I want to model using one of several standard distributions. I do not know which distribution will provided the best fit. Is there a standard approach for determining which distribution will provide the best fit? Ultimately, I would like write software to do this.

Bob

Bob
  • 103
  • 1
  • 1
    This is an awfully broad question... and why would you want to write software? Plenty of tools already exist. – jbowman Feb 26 '18 at 03:47
  • 1
    Numerous questions on site related to fitting of distributions to data discuss important aspects of the issue; it may pay to review some of them. The first post under "Related" in the sidebar on the right -> might be one place to start. Another one is [How to determine which distribution fits my data best?](https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best) – Glen_b Feb 26 '18 at 10:00

1 Answers1

2

Don't bother writing the software unless you're doing it for your own edification; MATLAB already has a Distribution Fitter tool built in (https://www.mathworks.com/help/stats/distributionfitter.html).

Standard approach? Novices pray the data is normally distributed, and check with histograms, Q-Q plots, and goodness-of-fit tests. And if it's not, they apply rules of thumb or a Box-Cox transformation to beat the data into submission.

Statisticians with a "feel for phenomena" consider the process being modeled, and narrow their search down to something appropriate. For example, lifetime or time-to-event data is usually modeled with exponential, gamma, weibull, or lognormal distributions (and some other more exotic ones).

A humble statistician goes to the source of the data, the researcher who collected it and has the best insight into the process being measured. The engineer, or chemist, or cancer researcher probably doesn't know the names of the all the distributions used to analyze their problems, but they can get you into the appropriate literature, and suggest folks who've studied similar problems.

Mike Anderson
  • 1,459
  • 9
  • 4