2

I have a continuous independent variable which is used to explain the dependent binary variable in logistic regression. The model user's requirement is to group this continuous variable to 6 bins. I use area under curve (AOC) as model performance measure. Thus I would like to bin the variable while maximizing (AOC).

I have been going through trial and error, for example trying to create almost equal number of observations in some middle buckets, but keeping top and bottom bucket separate. Since most event 1s are in the top bucket, and there are not many event 1s in bottom bucket, etc. Thus my continuous variable almost corresponds to a probability of estimating event 1, but it is not really accurate (since it is coming from some other model

What is the usual way to approach such problem? I am using R..

adam
  • 567
  • 2
  • 4
  • 16
  • (1) There's no requirement to bin continuous independent variables for logistic regression - see [What is the benefit of breaking up a continuous predictor variable?](http://stats.stackexchange.com/q/68834/17230). (2) Using the data to define the independent variable in terms of the dependent variable (a) invalidates the usual standard error estimates; & (b) introduces an optimistic bias into the resulting Gini index - as an estimate of the model's predictive performance on future data - (so make sure to validate any such procedure if you use it). – Scortchi - Reinstate Monica Aug 06 '15 at 16:41
  • [optimal binning in R](http://stats.stackexchange.com/q/119974/17230) may be relevant. – Scortchi - Reinstate Monica Aug 06 '15 at 16:46
  • The requirement to use binning for this variable comes from the model user. – adam Aug 06 '15 at 20:36
  • [smbinning](http://stats.stackexchange.com/questions/148769/optimal-binning-with-respect-to-a-given-response-variable) does sthg similar based on information value. Information value is related to Gini according to [what-is-the-relationship-between-the-gini-score-and-the-log-likelihood-ratio](http://stats.stackexchange.com/questions/94886/what-is-the-relationship-between-the-gini-score-and-the-log-likelihood-ratio) – adam Aug 07 '15 at 14:18

0 Answers0