1

I am trying to understand the basics of how and when is it ok discretize a variable.

Below are some papers that support Supervised Discretization:

Improving Classification Performance with Discretization on Biomedical Datasets

Feature selection via discretization

On the other hand, there is

Visual Revelations

Frank Harrel's page on problems caused by discretization

and a lot of other posts that discourage binning

What is the benefit of breaking up a continuous predictor variable?

Is binning of continuous data always bad for statistical tests?

Therefore, if I take the target class into account to decide the bins, it would help in feature selection in classification, but not arbitrary binning of continuous values ?

There is also an argument that says tree models do implicit binning, and that by pre-binning, we are not giving complete information to the model.

Does binning of ranges make sense for a Random Forest?

Appreciate any clarification.

learner
  • 537
  • 2
  • 8
  • See: (almost a dup?) https://stats.stackexchange.com/questions/230750/when-should-we-discretize-bin-continuous-independent-variables-features-and-when – kjetil b halvorsen Jul 24 '20 at 17:36

0 Answers0