In PCA, is there an intuitive explanation for why the second principal component chosen must be orthogonal to the first component?

Question

In principal components analysis, the principal components are chosen according to three criteria. The first component is chosen to be the direction in the data with greatest variance. The second is chosen to be the direction with greatest variance GIVEN that it is orthogonal to the first.

I am wondering if there is an intuitive way to understand this without having to resort to the proof, of which is hard for me to extract intuition from? Thanks.

If it weren't orthogonal (i.e. independent), it would explain variance that is already captured by the first component! Also see the great answers here: https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues — Frans Rodenburg, Oct 06 '17 at 00:51
Is that the same as saying that we are creating principal components that are correlated, and hence may introduce a collinearity issue when trying to regress on these summary/principal component variables? — user321627, Oct 06 '17 at 00:57
Do you have experience applying PCA to data sets? If so, in those cases, what lead you to use PCA? — user795305, Oct 06 '17 at 01:01
Usually I use it when I have a high dimensions in the predictor variables and they all are somewhat related to each other. I use it as a way to summarize the data as a whole into a few predictors that are linear combinations of the original. — user321627, Oct 06 '17 at 01:06
They would indeed be correlated if they were not orthogonal. The original variables may or may not be too strongly correlated leading to multicollinearity, which is one of the reasons one might use PCA prior to regression (PCR). If regression is your goal, there exist other alternatives you may want to consider, such as PLS or ridge regression. What is the purpose of your research? — Frans Rodenburg, Oct 06 '17 at 01:14
If you didn't include the orthogonality constraint, the solution to the second optimization problem obviously would be the first principal component, because nothing would have changed. What constraint, then, do you propose erecting in place of orthogonality? (I don't see how your question could be answered without knowing what you have in mind.) — whuber, Oct 06 '17 at 14:36

score 6 · Accepted Answer · answered Oct 06 '17 at 08:03

Say you fit the first component by maximizing the variance. Now, try to fit the second component by again maximizing the variance, but don't enforce any orthogonality constraints. You'll find that the "second" component in this case is identical to the first, up to an arbitrary sign flip (i.e. it will point in either the same or the opposite direction as the first component). This is because the optimization problem hasn't changed at all. Solving for the "second" component in this case is simply asking for the first component all over again.

How about we constrain on the condition a bit, say except the direction of the first component? What is the result, then? Is that the orthogonal one — Catbuilts, Oct 13 '20 at 19:11

score 1 · Answer 2 · answered Oct 06 '17 at 03:01

If the components weren't orthogonal then you could project the second component onto the first. Look at a and b vectors, they're not orthogonal. So, if you step in the direction of a vector, you actually move both up and right. Hence, these would be correlated.

In PCA, is there an intuitive explanation for why the second principal component chosen must be orthogonal to the first component?

2 Answers2