When to use Equal-Frequency-Histograms

Question

... instead of e.g. the popular Equal-Width-Histograms.

Additional question: What is a good/robust rule of the thumb to calculate the number of bins for equal frequency histograms (like the Freedmann-Diaconis-Rule for equal-width).

I really wonder how an equal frequency histogram might look like. I have the intuition that it is really flat. Can you give an example? — Henrik, Jan 31 '11 at 16:58
@Henrik: See e.g. this question (http://stats.stackexchange.com/questions/5573/how-to-build-an-equilibrated-histogram). Yes it is flat, so it clearly cannot be used for density estimation ;). However, since the equal-width approach is so generic, it seems that it can be applied in every situation equal-freq can be applied. So when to favor equal-freq ? — mlwida, Jan 31 '11 at 17:09
@Henrik No, an equal frequency histogram generally is *not* flat. Histograms are commonly confused with bar charts, which display values by means of the *heights* of bars. However, by definition, a histogram displays frequencies by means of *areas*. Consider (*e.g.*) the data {0,1,2,4,8,16,32,64}, to be shown in the range [0,100] with two bins. The break for an equal-frequency histogram has to be between 4 and 8. If we put it at 6, the height of the left bar *multiplied* by (6-0) = 6 equals 4, whence the height is 4/6. The height of the right bar equals 4/(100-6) = 4/94. Not flat at all! — whuber, Jan 31 '11 at 19:29
(Continued) See a Wikipedia example of a variable-width histogram at http://en.wikipedia.org/wiki/File:Travel_time_histogram_total_n_Stata.png , which is an illustration for its article on "Histogram." — whuber, Jan 31 '11 at 19:31
@steffen Your second question has already been asked and answered at http://stats.stackexchange.com/q/798/919 . More formulas appear at http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width . — whuber, Jan 31 '11 at 19:32
@whuber wow. It is weird to see that one still mixes up pretty basic things. Thanks a lot for pointing my mistake out! — Henrik, Jan 31 '11 at 20:50
@whuber: Thank you for your explanation. Your answer to the second question indicates, that all this rules are not necessarily require an equal-width-histogram. I did not know that. — mlwida, Feb 01 '11 at 07:13
@whuber: Although it seems "strange" that one can use FD to calculate bin-width, then the number of bins and then the equal-freq-bin-width. — mlwida, Feb 01 '11 at 08:37
@Steffen Sorry, I was mistaken. Those rules are for equal-width histograms. For equal-frequency histograms the theory is different, because you determine the *area* of each bar in advance. Thus, variation in the area is proportional to the square root of the (common) bin population. Choosing that population is therefore a tradeoff between horizontal precision (number of bars) and areal precision; where to come down in that tradeoff is your decision. — whuber, Feb 01 '11 at 14:45

Eponymous · Accepted Answer · 2012-07-19T15:38:19.587

6

This is not a proper or complete answer, but two observations from my personal experience:

An equal-frequency histogram will hide outliers (I've seen them in long, low bins).
The heights of the individual bins in an equal-frequency histogram seem more stable than in an equal-width histogram.

I use equal-frequency histograms mainly for exploratory analysis. They give me a better intuitive feel for the shape of the distribution than an equal-width histogram.

I am trying them now for an application where I am using function of a histogram of the data as a distance metric for two very skewed distributions. An equal-width histogram would have almost all of the samples in one bin, whereas an equal-frequency histogram with the same number of bins will have many narrow bins in that area. Intuitively, if we consider the height of a bin as a variable, the equal-frequency histogram will better spread the available distribution information among the variables.

edited Jul 19 '12 at 15:38

answered Jul 16 '12 at 00:27

Eponymous

438
3
8

1

(+1) thank you for this helpful reply. It seems you have used them regularly. I am curious when and why you have preferred to use them (instead of e.g. equal-width). – mlwida Jul 16 '12 at 12:07
1

I use them mainly for exploratory analysis. They give me a better intuitive feel for the shape of the distribution than an equal-width histogram. I am trying them now for an application where I am using function of a histogram of the data as a distance metric for two very skewed distributions. An equal-width histogram would have almost all of the samples in one bin, whereas an equal-frequency histogram will have many narrow bins in that area. Intuitively the equal-frequency histogram will better spread the available distribution information among the variables. – Eponymous Jul 16 '12 at 16:25
This sounds reasonable, thank you again ! Could you be so kind to merge your last comment with your answer ? I'd like to accept it. – mlwida Jul 17 '12 at 07:07
There you go. My comment is now merged into the answer. – Eponymous Jul 19 '12 at 15:40

score 1 · Answer 2 · answered Feb 13 '11 at 22:14

1

Equi-depth histograms are a solution to the problem of quantization (mapping continuous values to discrete values).

For finding the best number of bins, I think it really depends on what you are trying to do with the histogram. In general I think it would be best to ensure your error of choice was below some threshold (eg. Sum of squared errors < THRESH) and bin the values in that manner.

Alternatively, the number of bins can be passed in as a parameter (if you're concerned about the space consumption of the histogram).

answered Feb 13 '11 at 22:14

Nick

3,327
6
28
24

Thank you for the response, however, I see no value in it: 1. As far as I see, Quantization is not focused (primarily or solely) on equal-freq-histograms 2. Determining the number of bins per hand or per automatic optimization (via sum-of-squared-errors) is an approach which can be applied anywhere. – mlwida Feb 14 '11 at 09:10
"No value" was a little bit too harsh, I meant: "no value" for the specific nature of my question which is focused on equal-freq-histograms (and rules of the thumb for it). – mlwida Feb 14 '11 at 11:19

When to use Equal-Frequency-Histograms

2 Answers2

Linked