How to use Ridge Regression for classification? (Or other suggestions)

Question

I have a sparse regression problem (Sparse because a few inputs are factors so we have a lot of columns of 1s and 0s). I am thinking of Ridge Regression because of the sparsity, but also because a lot of the terms will have interaction effects. I also want an interpretable model.

Is there a way to use the Ridge Penalty for a Linear Regression Classifier? If not, is there any base learner which allows me to include interaction effects and still result in a sparse solution.

check my answer here. where logistic regression can also use regularization. https://stats.stackexchange.com/questions/228763/regularization-methods-for-logistic-regression/228785#228785 — Haitao Du, Apr 18 '17 at 13:44

score 6 · Answer 1 · answered Dec 05 '13 at 08:51

Yes, ridge regression can be used as a classifier, just code the response labels as -1 and +1 and fit the regression model as normal. Allen's PRESS statistic (i.e. the leave-one-out estimate of the squared error) works fine as a model selection criterion (e.g. for selecting the ridge parameter). In my experience it works about as well as a linear support vector machine on most problems (the main reason the SVM is a good classifier is that it is regularised - the difference in the loss function is secondary). Note this is also equivalent to a linear Least-Squares Support Vector Machine, which is a quite well regarded classifier.

Standard ridge regression will not give you sparsity though, for that you want an L1 (LASSO) type penalty function, in which case I would recommend LARS, or perhaps L1 regularised logistic regression.

+1. I would also propose SCAD as a potential variant of a penalized likelihood approach. — usεr11852, Dec 05 '13 at 11:54

How to use Ridge Regression for classification? (Or other suggestions)

1 Answers1