Why is it useful to have the same amount of data in each level of a categorical variable?

Question

Something my lecturer said but I can't find why this is the case. I have to make a continuous variable into a categorical one, and the data is left skewed. Is it more important to have equal ranges or equal amounts of data in each range? Why?

Possible duplicate of [What is the benefit of breaking up a continuous predictor variable?](https://stats.stackexchange.com/questions/68834/what-is-the-benefit-of-breaking-up-a-continuous-predictor-variable) — kjetil b halvorsen, Apr 13 '18 at 12:47
There is much good advice in the thread identified as duplicate, but it isn't what the OP is asking at all. A duplicate isn't ever the different question you should be asking. — Nick Cox, Apr 13 '18 at 13:00

score 1 · Answer 1 · answered Apr 13 '18 at 12:14

My answer is - it depends.

It depends on why you do the discretization of the variable. If you want subsequently to build a statistical classifier from the discretized variable(s), you need to choose a binning (meaning the set of thresholds defining the interval bins all together), which is optimal for discerning the categories. If your purpose is merely display of the variable distribution in a histogram, you often prefer a uniform binning where the width of each bin is equal all the other bins.

Why is it useful to have the same amount of data in each level of a categorical variable?

1 Answers1