0

Something my lecturer said but I can't find why this is the case. I have to make a continuous variable into a categorical one, and the data is left skewed. Is it more important to have equal ranges or equal amounts of data in each range? Why?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Kev.D
  • 1
  • 2
  • 8
    Why do this any way? – Nick Cox Apr 13 '18 at 11:07
  • Possible duplicate of [What is the benefit of breaking up a continuous predictor variable?](https://stats.stackexchange.com/questions/68834/what-is-the-benefit-of-breaking-up-a-continuous-predictor-variable) – kjetil b halvorsen Apr 13 '18 at 12:47
  • 1
    There is much good advice in the thread identified as duplicate, but it isn't what the OP is asking at all. A duplicate isn't ever the different question you should be asking. – Nick Cox Apr 13 '18 at 13:00

1 Answers1

1

My answer is - it depends.

It depends on why you do the discretization of the variable. If you want subsequently to build a statistical classifier from the discretized variable(s), you need to choose a binning (meaning the set of thresholds defining the interval bins all together), which is optimal for discerning the categories. If your purpose is merely display of the variable distribution in a histogram, you often prefer a uniform binning where the width of each bin is equal all the other bins.

Match Maker EE
  • 1,701
  • 4
  • 15