1

I believe Dichotomizing(also called bucketing/binning) of continuous variable is not always a good idea. My colleague while building regression model always bins continuous variables and only keep dichotomous variable in the final model. My counter arguments to him

  1. Lose lot of valuable information and reduce predictive power of variable
  2. May cause some customers to get probability score just based on intercept

His arguments

  1. Dont have to worry about non-linearity and no need to transform variables

How can I convince him its not a good practise, is there any good research paper on this? What are the drawback of having only dichotomous variables in the final model?

GeorgeOfTheRF
  • 5,063
  • 14
  • 42
  • 51
  • 3
    Your colleague should be reminded that dichotomizing data is itself a type of transformation, perhaps sometimes useful, but more often probably not. Finding the appropriate transformation of variables is an important part of data analysis. And if the cut-off point is chosen by inspection of the data, your statistical analyses will be highly suspect. See answers to http://stats.stackexchange.com/questions/92720/when-to-dichotomize-a-variable-for-correlational-analysis/92723#92723 for an amusing analysis of how dichotomization causes problems. – EdM Sep 25 '14 at 18:48
  • I would also add to your arguments (3) can bias interpretations of actual results (e.g. in cases where the relationships are nonlinear). – Alexis Sep 25 '14 at 19:06
  • Try demonstrating how much better predictive accuracy on a holdout sample will be when using the original continuous variables. – rolando2 Sep 25 '14 at 23:17
  • Questions about discretizing continuous data have been asked and answered *many* times here (Frank Harrell to chime in in 3... 2... 1...). @EdM gives one good example; I pointed to another one in flagging this for closure as a duplicate. – Stephan Kolassa Sep 26 '14 at 11:13
  • I am as asking question about building a model with only binary variables. I don't think it's asked before. Please share the link where it's asked before. – GeorgeOfTheRF Sep 26 '14 at 11:27
  • Or [here](http://stats.stackexchange.com/questions/68834/what-is-the-benefit-of-breaking-up-a-continuous-predictor-variable/68839) – Scortchi - Reinstate Monica Sep 29 '14 at 13:01

0 Answers0