3

I am performing logistic regression. I understand assumptions of logistic regression - Outliers, Multicollinearity. What i didn't understand how to select variables at beginning of model preparation. Do i need to check outcome of Y (event) with each independent variable and plot a scatter diagram using these two variable? Or Should i plot scatter diagram using both event and non-event against each independent variable? What are the criteria to select and eliminate variable? I have seen some researchers take log, exp of x to improve model accuracy. I am aware of variable selection techniques - backward, forward and stepwise. But these variable selection techniques come into use when you include them into model.

Next question : If an independent variable is continuous, we grouped them in deciles and then we would see relationship between grouped categories and Y. If relationship is positive for some categories and negative for some categories. Should we use two variables - one for positive relation and other one for negative relation? Should we use numerical values or categorized values in this case?

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
Riya
  • 589
  • 2
  • 7
  • 15
  • See questions with the [model-selection](http://stats.stackexchange.com/questions/tagged/model-selection) tag ([this answer](http://stats.stackexchange.com/questions/20836/20856#20856) is especially good), & [this question](http://stats.stackexchange.com/questions/68834/) on discretizing continuous variables. Selecting, eliminating, & otherwise mucking about with independent variables based on examining their relationship to the dependent variable, then fitting a model to the same data & using it as if you hadn't done any of that can be rather dangerous. – Scortchi - Reinstate Monica Nov 06 '14 at 17:23

1 Answers1

4

If you are trying to select a subset of important features of your training dataset, you could take a look at lasso regularization. It can perform automatic feature selection.

http://en.wikipedia.org/wiki/Least_squares#Lasso_method

jmnavarro
  • 446
  • 5
  • 12