Why is L2 regression good for handling multicollinearity?

Question

Looking for an intuitive explanation, thanks.

Possible duplicate of [Why is ridge regression called "ridge", why is it needed, and what happens when $\lambda$ goes to infinity?](https://stats.stackexchange.com/questions/151304/why-is-ridge-regression-called-ridge-why-is-it-needed-and-what-happens-when) or https://stats.stackexchange.com/questions/118712/why-does-ridge-estimate-become-better-than-ols-by-adding-a-constant-to-the-diago/119708#119708 — Sycorax, Mar 01 '19 at 22:05

score 0 · Answer 1 · answered Mar 01 '19 at 22:21

take the case of two perfectly correlated independent variables, x1 and x2 then the corresponding coefficients w1, w2 can go to +/- infty (by adjusting the other appropriately), and we have an infinite number of solutions.

adding L2 regularisation, means that of all these solutions (with same mean square error), there is a best solution - namely the one with smallest l2 norm. assuming we normalise the variables, this will have w1=w2, ie we take the average x1 and x2.

Why is L2 regression good for handling multicollinearity?

1 Answers1