2

It is clear that the first principal component is the vector which is the closest to the data, but can someone prove why the first two principal components span a plane that is the closest to the data?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Rafi
  • 21
  • 1

1 Answers1

0

The first PC (PC1) is the linear combination that maximizes variance. If you replace the data points with PC1, this is closest to the data in the sense that it minimizes the (Euclidean) norm of the residual. Now, PC2 maximizes variance among all linear combinations orthogonal to PC1.

If you replace the data points with (PC1, PC2) again, this is the plane closest to the point swarm in the sense of minimizing the Euclidean norm of the residual vector. See Geometric understanding of PCA in the subject (dual) space

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467