I am using logistic regression for classification purpose. For reduction of features and better precision I am using Weight of evidence technique. Also I need to use python for this. As there is no readily available algorithm for binning, I was searching for the rules of binning and I came across this:
http://www.m-hikari.com/ams/ams-2014/ams-65-68-2014/zengAMS65-68-2014.pdf
This paper says that:
A good binning algorithm should follow the following guidelines:
Missing values are binned separately.
Each bin should contain at least 5% of observations.
No bins have 0 accounts for good or bad
I don't understand what is the necessity of second condition i.e. each bin should contain at least 5% of observations? why is it necessary to have at least 5% observation in each bin? Can't I have at least 2% in each bin or at least 10% in each bin.
Someone told me that there will be more points if we consider 5% in each bin. Why is it necessary to have more points when you want to make already continuous data into categorical data?