Something my lecturer said but I can't find why this is the case. I have to make a continuous variable into a categorical one, and the data is left skewed. Is it more important to have equal ranges or equal amounts of data in each range? Why?
Asked
Active
Viewed 56 times
0
-
8Why do this any way? – Nick Cox Apr 13 '18 at 11:07
-
Possible duplicate of [What is the benefit of breaking up a continuous predictor variable?](https://stats.stackexchange.com/questions/68834/what-is-the-benefit-of-breaking-up-a-continuous-predictor-variable) – kjetil b halvorsen Apr 13 '18 at 12:47
-
1There is much good advice in the thread identified as duplicate, but it isn't what the OP is asking at all. A duplicate isn't ever the different question you should be asking. – Nick Cox Apr 13 '18 at 13:00
1 Answers
1
My answer is - it depends.
It depends on why you do the discretization of the variable. If you want subsequently to build a statistical classifier from the discretized variable(s), you need to choose a binning (meaning the set of thresholds defining the interval bins all together), which is optimal for discerning the categories. If your purpose is merely display of the variable distribution in a histogram, you often prefer a uniform binning where the width of each bin is equal all the other bins.

Match Maker EE
- 1,701
- 4
- 15