Defining bin sizes for plotting histograms

Question

Perhaps a basic question, but is there a method or definition of how to choose the optimum bin size for some data with the intent to plot as a histogram?

At present the best option I can think of is to fit distribution functions to the histograms and choose the bin size where the fit is best.

Is there a definition based on the data size or some other metric?

score 0 · Accepted Answer · answered Nov 26 '17 at 05:53

Rob Hyndman answers this question well using the Freedman-Diaconis rule.

I would argue though that there is no "optimal" bin size, and the right number is likely to depend on the question you're trying to answer by plotting a histogram. To that end, plotting the data with several different bin sizes is probably a good idea.

Also check out kernel density estimation as another approach which doesn't invoke bins at all, but does require you to select an appropriate kernel and bandwidth.

Defining bin sizes for plotting histograms

1 Answers1