1

I've been looking up the problem of deciding appropriate binwidths for histograms and here's my broad-level understanding so far:

If we have $n$ data points, we assume that they're realizations of $n$ random variables following an unknown distribution $f$. Histograms are essentially density estimators that attempt to derive an estimate $\hat{f}$ of the underlying distribution. $\hat{f}$ depends on our choice of binwidth $h$, and an appropriate choice would give us an estimate that is close to the original distribution. The "closeness" is characterized by integrated mean square error.

There are a few rules for binning as given on Wikipedia, like the Freedman-Diaconis' choice, Sturges' rule and so on. My question is: how were these rules derived in the first place given that we don't know $f$? If we don't know $f$, we can't explicitly calculate IMSE and no optimization can be done. Were these rules applied to simulated data sets generated from a wide variety of probability distributions, and selected because they worked in most of those cases?

I'm not looking for exact derivations at this point, just the paradigm.

Shirish Kulhari
  • 638
  • 4
  • 12
  • Some discussion of Sturges' rule and its connection to Doane's formula (and pointers to a paper by Rob Hyndman discussing a problem with them) is [here](https://stats.stackexchange.com/questions/55134/doanes-formula-for-histogram-binning/55205). Discussion of some other rules can be found on site as well. Try some searches for [Freedman-Diaconis](https://stats.stackexchange.com/search?q=Freedman-Diaconis) for example – Glen_b May 30 '17 at 11:53

0 Answers0