I'm looking for optimal binning method (discretization) of a continuous variable with respect to a given response (target) binary variable and with maximum number of intervals as a parameter.
example: I have a set of observations of people with "height" (numeral continuous) and "has_back_pains" (binary) variables. I want do discretize height into 3 intervals (groups) at most with different proportion of people with back aches, so that the algorithm maximizes the difference between the groups (with given restrictions for instance, that each interval has at least x observations).
The obvious solution to this problem would be to use decision trees(a simple one-variable model) , but I can't find any function in R that would have "maximal number of branches" as a parameter - all of them divide the variable into 2 gropus (<=x and >x). SAS miner has a "maximum branch" parameter but I'm looking for a non commercial solution.
some of my variables have just a few unique values (and could be treated as discrete variables) but I want to discretize them as well into a smaller number of intervals.
The closest solution to my problem is implemented in the smbinning package in R (which relies on ctree function from party package) but it has two drawbacks: it's impossible to set the number of intervals (however, you can find a way around it by changing the p parameter) and it doesn't work when data vector has less than 10 unique values. Anyway, you can see the example output here(Cutpoint and Odds columns are crucial):
Cutpoint CntRec CntGood CntBad CntCumRec CntCumGood CntCumBad PctRec BadRate Odds LnOdds WoE IV
1 <= 272 9081 169 8912 9081 169 8912 0.1874 0.9814 0.0190 -3.9653 -0.6527 0.0596
2 <= 311 8541 246 8295 17622 415 17207 0.1762 0.9712 0.0297 -3.5181 -0.2055 0.0068
3 <= 335 2986 163 2823 20608 578 20030 0.0616 0.9454 0.0577 -2.8518 0.4608 0.0163
4 Missing 27852 1125 26727 48460 1703 46757 0.5747 0.9596 0.0421 -3.1679 0.1447 0.0129
5 Total 48460 1703 46757 NA NA NA 1.0000 0.9649 0.0364 -3.3126 0.0000 0.0956
Oh, I'm fully aware that binning results in information loss and that there are better methods, but I'm going to use it for data visualization and treat those variables as a factor.