Why were 'Regularization' methods (Lasso, Ridge, Elastic Net) created in the first place?

Question

What problem do regularization methods solve? I thought it was feature selection and to prevent overfitting. However, I was informed that the reason Ridge, Lasso, and Elastic Net were created in the first place was "to deal with colinearity." However, I am not readily finding anything online to support this.

Wikipedia says, "Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting."

Where does the linear dependence of columns come into play in regularization? For example, if the columns exhibit multicollinearity, how does regularization know which features to select and which to reduce towards zero?

I'd suggest trying to find early papers on ridge regression... — Glen_b, May 27 '17 at 06:51
I think multicollinearity may be another instance of an "ill-posed problem", as in the Wikipdia definition. Example: Consider we want to solve Ax=b (i.e, given observations b and matrix A, extract x). If A is nxN, where N>>n, then the problem is ill posed (we have n equations but N variables), i.e infinite solutions for x. By introducing regularization we can force a unique solution and the problem is no longer ill-posed. If A is square, nxn, but multicollinear, then essentially the rank is some k — galoosh33, May 27 '17 at 07:02
I too suggest reviewing some of the early work on ridge regression to identify its role in addressing multicollinearity. Like many good tools other uses have been discovered for it. Maximum likelihood methods that use a "modified Gauss-Newton algorithm" often turn out to use a method that can be seen as looking something like ridge regression. They have a parameter, lambda, that controls a mix of the steepest descent method and the Gauss-Newton method. Lambda is like a ridge parameter. These work well wherever it is hard to make a first guess at the solution. — David Smith, May 28 '17 at 19:23
I found a useful write-up on Ridge here (touches on Co-linearity): https://tamino.wordpress.com/2011/02/12/ridge-regression/... Does anyone have any papers in mind to recommend? — Scott Skiles, May 30 '17 at 13:14
it can help to play with small linear systems in 2d -- when there's collinearity, there can be an infinite number of solutions that minimize the sum of squared errors. in other words, the family of solutions to ordinary least squares lies on a line. by adding regularization, one can pick among these by some other criterion. if it's the l2 norm then it's essentially picking the point nearest the origin. — Michael Curry, Jul 08 '17 at 13:36

Why were 'Regularization' methods (Lasso, Ridge, Elastic Net) created in the first place?

0 Answers0