I am learning about ridge regression and know that ridge regression tends to work better in the presence of multicollinearity. I am wondering why this is true? Either an intuitive answer or a mathematical one would be satisfying (both types of answers would be even more satisfying).
Also, I know that that $\hat{\beta}$ can always be obtained, but how well does ridge regression work in the presence of exact collinearity (one independent variable is a linear function of another)?