I'm trying to wrap my head around the L2 regularization component in ridge regression, to build a model on noisy, correlated data.
I understand the $\lambda$ introduces a penalty for high bias during the fitting stage, and that this can reduce the chance of overfit when using highly correlated or multicolinear variables.
My questions are:
- what is the relationship to PCA, which I understand 'fixes' the problem of colinearity by separating them out into their maximally independent eigencomonents?
- in general, should you do PCA before you try and fit anything?
- Does PCA offer any guarantees on the colinearity of outputs?