Does the first principle component maximize variance on the two variables with greatest covariance or on all variables simultaneously?

Question

Suppose I am performing PCA on 3 standardized variables: height, weight, and income. I understand that each principle component maximizes variance along a new line, but there are two ways I can see this happening, and I am unsure which is accurate:

1) The first principle component considers all three variables and finds the line of greatest variation through the entire three-dimensional data cloud.

vs.

2) The first principle component is calculated to maximize variance in the two dimensional plane between the two variables with greatest covariance.

I suspect the first explanation is true, but given that each covariance value in the the covariance matrix considers no more than 2 variables, I am unsure of my logic.

See in many places here, such as https://stats.stackexchange.com/a/22571/3277 — ttnphns, Jul 14 '19 at 20:41

score 1 · Accepted Answer · answered Jul 14 '19 at 20:48

The first PC (actually all of the PCs are) is a linear combination of all the feature axes, not the first two with the largest (or absolute largest) covariance, specifically. It doesn't focus on pairs only. Consider data samples already distributed in a hyperplane, i.e. $\sum c_ix_i=a$, where $x_i$ are your feature axes. The first PC must be on the plane, but if your second argument was true, you couldn't place it on the hyperplane with only two of your feature axes.

Does the first principle component maximize variance on the two variables with greatest covariance or on all variables simultaneously?

1 Answers1