1

For this project I was required to create a credit risk scorecard witht the 4 most relevant variables, so I binned all variables and selected them by chi2 and IV.

I ran the logistic and linear regressions with dummies for each bin of these variables (so if Var1 had 3 bins, I would include 2 dummies for it). I found that some of these bins were non significant, so I wonder, what should it be done it that case? Cause in my opinion it doesn't make much sense to "remove" a bin from a variable.

As an example, let's say the variable is Purpose (of the loan), and the bands are the following:

a) car/electronics
b) house 
c) furniture/remodeling
d) education/business

In my regression I would include dummies for bands a, b and c (so band d is case 0). Let's say that the dummy for band b is non significant, with a very large p-value, what can be done in that case? Normally I would remove that variable, but since this is technically part of one variable, what is the procedure?

amestrian
  • 61
  • 2
  • 2
    [Don't bin your continuous data](https://stats.stackexchange.com/q/68834/1352). Feed them into your algorithm as-is; potentially transform them using (e.g.) restricted cubic splines (see, e.g., Frank Harrell's *Regression Modeling Strategies*) to capture any nonlinearity. [Cf. here.](https://stats.meta.stackexchange.com/q/5000/1352) – Stephan Kolassa Aug 07 '20 at 14:50
  • @StephanKolassa thank you! Let's say the variable wasn't continuous, let's say it's categoric, as in "purpose of the loan", where 1-car, 2-house, 3-education, etc. I believe it's okay to bin in that case, but the situation I explained still happens. Do you have any idea of that? – amestrian Aug 07 '20 at 14:59
  • @StephanKolassa I edited the original post for that case :) – amestrian Aug 07 '20 at 15:11
  • 1
    It makes little sense to test for significance of individual values of a categorical variable: either you include the variable, with all its possible values, or not. You therefore should be testing all these values at once rather than individually (using an LR test for logistic regression and an F test for OLS regression). – whuber Aug 07 '20 at 15:22

0 Answers0