It is clear that the first principal component is the vector which is the closest to the data, but can someone prove why the first two principal components span a plane that is the closest to the data?
Asked
Active
Viewed 91 times
2
-
I don't think closest to the data is correct. – SmallChess Mar 10 '17 at 14:17
1 Answers
0
The first PC (PC1) is the linear combination that maximizes variance. If you replace the data points with PC1, this is closest to the data in the sense that it minimizes the (Euclidean) norm of the residual. Now, PC2 maximizes variance among all linear combinations orthogonal to PC1.
If you replace the data points with (PC1, PC2) again, this is the plane closest to the point swarm in the sense of minimizing the Euclidean norm of the residual vector. See Geometric understanding of PCA in the subject (dual) space

kjetil b halvorsen
- 63,378
- 26
- 142
- 467