0

From here Should one remove highly correlated variables before doing PCA? we know that when there are some highly correlated Features during the PCA, we should remove them to avoid some incorrect extremely high variance principle component.

On the other hand, PCA is used to solve Multicollinearity.

So are above two things contradict? I understand that PCA just aggregates the correlated features together into one feature.

user6703592
  • 745
  • 3
  • 8
  • 1
    The accepted answer finishes : 'We can see now that there may be merit in discarding variables thought to be measuring the same underlying (but "latent") aspect of a collection of variables, because including the nearly-redundant variables can cause the PCA to overemphasize their contribution. There is nothing mathematically right (or wrong) about such a procedure; it's a judgment call based on the analytical objectives and knowledge of the data.' I.e. there is no contradiction, but it is a matter of taste and judgement whether it should be done – ReneBt Oct 15 '20 at 11:14
  • It's not generally true that highly correlated features should be removed prior to PCA, and the accepted answer in the thread you linked doesn't make such a recommendation. It shows how such correlations affect PCA, and says that your problem/goals should determine whether to remove them or not. – user20160 Oct 15 '20 at 11:23

0 Answers0