Why wouldn't you perform PCA before performing ridge regression on highly correlated parameters?

Question

I'm trying to wrap my head around the L2 regularization component in ridge regression, to build a model on noisy, correlated data.

I understand the $\lambda$ introduces a penalty for high bias during the fitting stage, and that this can reduce the chance of overfit when using highly correlated or multicolinear variables.

My questions are:

what is the relationship to PCA, which I understand 'fixes' the problem of colinearity by separating them out into their maximally independent eigencomonents?
in general, should you do PCA before you try and fit anything?
Does PCA offer any guarantees on the colinearity of outputs?

Could you explain what you mean by "collinearity of outputs" in the third bullet? As far as your second bullet goes, in the context of Ridge Regression, which requires initial standardization of the variables, I am reminded that [PCA based on correlation and PCA based on covariance can yield quite different results](https://stats.stackexchange.com/questions/62677); and that there's even an [issue concerning how many highly correlated values one uses in PCA](https://stats.stackexchange.com/questions/50537). — whuber, Feb 16 '21 at 21:44
Ah, I had meant to say, does PCA guarantee that no linear relationship exists between the principle components. These are very helpful links, they may answer my question. Thanks so much! — cjm2671, Feb 16 '21 at 21:50
The answer to the question in your comment is "yes, by definition," in the sense that if you regress any PCA component against any subset of the rest, it makes no difference: all coefficients will be zero. — whuber, Feb 16 '21 at 21:52

Why wouldn't you perform PCA before performing ridge regression on highly correlated parameters?

0 Answers0