... instead of e.g. the popular Equal-Width-Histograms.
Additional question: What is a good/robust rule of the thumb to calculate the number of bins for equal frequency histograms (like the Freedmann-Diaconis-Rule for equal-width).
... instead of e.g. the popular Equal-Width-Histograms.
Additional question: What is a good/robust rule of the thumb to calculate the number of bins for equal frequency histograms (like the Freedmann-Diaconis-Rule for equal-width).
This is not a proper or complete answer, but two observations from my personal experience:
An equal-frequency histogram will hide outliers (I've seen them in long, low bins).
The heights of the individual bins in an equal-frequency histogram seem more stable than in an equal-width histogram.
I use equal-frequency histograms mainly for exploratory analysis. They give me a better intuitive feel for the shape of the distribution than an equal-width histogram.
I am trying them now for an application where I am using function of a histogram of the data as a distance metric for two very skewed distributions. An equal-width histogram would have almost all of the samples in one bin, whereas an equal-frequency histogram with the same number of bins will have many narrow bins in that area. Intuitively, if we consider the height of a bin as a variable, the equal-frequency histogram will better spread the available distribution information among the variables.
Equi-depth histograms are a solution to the problem of quantization (mapping continuous values to discrete values).
For finding the best number of bins, I think it really depends on what you are trying to do with the histogram. In general I think it would be best to ensure your error of choice was below some threshold (eg. Sum of squared errors < THRESH) and bin the values in that manner.
Alternatively, the number of bins can be passed in as a parameter (if you're concerned about the space consumption of the histogram).