SPSS has an optimal binning function that helps categorizing into meaningful intervals continuous predictors when a binary response variable exists. I was looking for an equivalent function in R but I'm not finding any. I'm not sure that using bins derived by CART or CTREE could be equivalent.
Asked
Active
Viewed 3,086 times
3
-
8In practice very, very few people know both SPSS and R in any depth. I think you would need to be much more precise what this "optimal binning" is to get an answer. That aside, binning a continuous predictor is widely deprecated as very poor statistical practice, in my view fairly. http://biostat.mc.vanderbilt.edu/wiki/Main/CatContinuous is a good introduction. In addition, "optimal binning" (not your name, presumably) is a loaded term! – Nick Cox Oct 14 '14 at 10:13
-
5See also e.g. http://stats.stackexchange.com/questions/68834/what-is-the-benefit-of-breaking-up-a-continuous-predictor-variable in this forum. – Nick Cox Oct 14 '14 at 10:35
-
I agree that restricted cubic splines or non parametric smoothers takes better into account non - linearity. Nevertheless the algorithm that this analysis will derive cannot make use of such smoothers. – Giorgio Spedicato Oct 14 '14 at 15:05
-
There is a `cut` function and in documentation of `?hist` you can find info about algorithms that choose "optimal" number of bins for histogram. See also http://stats.stackexchange.com/questions/163778/how-do-you-find-a-cutting-point-strong-slope-within-one-dimensional-data/163787#163787 – Tim Oct 17 '15 at 07:57
2 Answers
3
There is now a package call "smbinning" that longs for Optimal Binning for Scoring Modeling since early 2015. It gives you the optimal cut point for a numeric variable, more precisely, optimizing the information value. It is able to handle categorical variable and missing value as well.
For example:
smbinning(df, y , x, p = 0.05)
- df <- Data frame
- y <- Binary dependent variable
- x <- numeric independent variable
- p <- Percentage of records per bin
It returns a list that contains the information value, Information value table and others. you may find detail in the documentation at CRAN or http://www.scoringmodeling.com/

Anthony Lei
- 371
- 1
- 10
-
3To be honest not the biggest fan of the smbinning package. I haven't coded anything better but the coding in the package feels "amateurish", and it fails in many of the test cases I tried. I don't recommend smbinning at v0.2. – xiaodai Nov 24 '15 at 02:51
2
You can test the discretization package and the cutPoints function : http://cran.r-project.org/web/packages/discretization/discretization.pdf.

Franck Berthuit
- 36
- 1