Looking for an intuitive explanation, thanks.
Asked
Active
Viewed 636 times
0
-
5Possible duplicate of [Why is ridge regression called "ridge", why is it needed, and what happens when $\lambda$ goes to infinity?](https://stats.stackexchange.com/questions/151304/why-is-ridge-regression-called-ridge-why-is-it-needed-and-what-happens-when) or https://stats.stackexchange.com/questions/118712/why-does-ridge-estimate-become-better-than-ols-by-adding-a-constant-to-the-diago/119708#119708 – Sycorax Mar 01 '19 at 22:05
1 Answers
0
take the case of two perfectly correlated independent variables, x1 and x2 then the corresponding coefficients w1, w2 can go to +/- infty (by adjusting the other appropriately), and we have an infinite number of solutions.
adding L2 regularisation, means that of all these solutions (with same mean square error), there is a best solution - namely the one with smallest l2 norm. assuming we normalise the variables, this will have w1=w2, ie we take the average x1 and x2.

seanv507
- 4,305
- 16
- 25