0

Perhaps a basic question, but is there a method or definition of how to choose the optimum bin size for some data with the intent to plot as a histogram?

At present the best option I can think of is to fit distribution functions to the histograms and choose the bin size where the fit is best.

Is there a definition based on the data size or some other metric?

Q.P.
  • 248
  • 1
  • 13

1 Answers1

0

Rob Hyndman answers this question well using the Freedman-Diaconis rule.

I would argue though that there is no "optimal" bin size, and the right number is likely to depend on the question you're trying to answer by plotting a histogram. To that end, plotting the data with several different bin sizes is probably a good idea.

Also check out kernel density estimation as another approach which doesn't invoke bins at all, but does require you to select an appropriate kernel and bandwidth.

Callum Webb
  • 151
  • 5