5

I have a sparse regression problem (Sparse because a few inputs are factors so we have a lot of columns of 1s and 0s). I am thinking of Ridge Regression because of the sparsity, but also because a lot of the terms will have interaction effects. I also want an interpretable model.

Is there a way to use the Ridge Penalty for a Linear Regression Classifier? If not, is there any base learner which allows me to include interaction effects and still result in a sparse solution.

amoeba
  • 93,463
  • 28
  • 275
  • 317
John
  • 91
  • 1
  • 1
  • 3
  • check my answer here. where logistic regression can also use regularization. https://stats.stackexchange.com/questions/228763/regularization-methods-for-logistic-regression/228785#228785 – Haitao Du Apr 18 '17 at 13:44

1 Answers1

6

Yes, ridge regression can be used as a classifier, just code the response labels as -1 and +1 and fit the regression model as normal. Allen's PRESS statistic (i.e. the leave-one-out estimate of the squared error) works fine as a model selection criterion (e.g. for selecting the ridge parameter). In my experience it works about as well as a linear support vector machine on most problems (the main reason the SVM is a good classifier is that it is regularised - the difference in the loss function is secondary). Note this is also equivalent to a linear Least-Squares Support Vector Machine, which is a quite well regarded classifier.

Standard ridge regression will not give you sparsity though, for that you want an L1 (LASSO) type penalty function, in which case I would recommend LARS, or perhaps L1 regularised logistic regression.

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178