I believe Dichotomizing(also called bucketing/binning) of continuous variable is not always a good idea. My colleague while building regression model always bins continuous variables and only keep dichotomous variable in the final model. My counter arguments to him
- Lose lot of valuable information and reduce predictive power of variable
- May cause some customers to get probability score just based on intercept
His arguments
- Dont have to worry about non-linearity and no need to transform variables
How can I convince him its not a good practise, is there any good research paper on this? What are the drawback of having only dichotomous variables in the final model?