Characteristic of good binning for weight of evidence algorithm

Question

I am using logistic regression for classification purpose. For reduction of features and better precision I am using Weight of evidence technique. Also I need to use python for this. As there is no readily available algorithm for binning, I was searching for the rules of binning and I came across this:

http://www.m-hikari.com/ams/ams-2014/ams-65-68-2014/zengAMS65-68-2014.pdf

This paper says that:

A good binning algorithm should follow the following guidelines:

Missing values are binned separately.
Each bin should contain at least 5% of observations.
No bins have 0 accounts for good or bad

I don't understand what is the necessity of second condition i.e. each bin should contain at least 5% of observations? why is it necessary to have at least 5% observation in each bin? Can't I have at least 2% in each bin or at least 10% in each bin.

Someone told me that there will be more points if we consider 5% in each bin. Why is it necessary to have more points when you want to make already continuous data into categorical data?

The 5% is almost certainly a rule of thumb that one person made up, and was then propagated through your community, finally finding itself in this paper. It would be neglectful of me to not mention that binning itself is not viewed as a good technique by experienced data scientists and statisticians. For continuous features, it is less efficient at improving goodness of fit than using splines to effect a basis expansion. For categorical features, unless based on prior subject matter expertise, it is less principled than a good regularization strategy. — Matthew Drury, Feb 24 '17 at 04:46
@MatthewDrury, thank you so much for throwing light on it. I am interested to know more about "splines to effect a basis expansion" method. I never heard about it before or may be the name is different. Can you please give me some elementary information about it (please suggest some books or URLs, if any) — Artiga, Feb 24 '17 at 05:00
For the benefits of binning a continuous variable see https://stats.stackexchange.com/questions/68834/what-is-the-benefit-of-breaking-up-a-continuous-predictor-variable/68839#68839 — kjetil b halvorsen, Apr 23 '20 at 04:00

score 7 · Answer 1 · answered Jan 09 '18 at 10:01

The 5% condition is a rule of thumb for Weight of Evidence (WOE) binning. In general, a good WOE binning of a variable should also have the following characteristics: 1. Monotonous increase/decrease in WOE for consecutive bins. This is because the WOE is used primarily for logistic/linear regression models which assumes a linear relationship between log odds and independent variables. 2. WOE values for different bins should be as diverse as possible. Hence, you should merge consecutive bins that have similar WOE values.

Further, if you wish to choose an automated approach to WOE binning, check out the following package in R: https://CRAN.R-project.org/package=woeR It lets you choose the minimum percent of observations in each class, no of bins you want to start with and the woe cutoff for merging consecutive bins.

P.S.: I authored the above package in R

Welcome to CV. Thank you for disclosing that you authored the package. I upvoted you. :-) — Ferdi, Jan 09 '18 at 10:20

Characteristic of good binning for weight of evidence algorithm

1 Answers1