Why PCA needs to consider perpendicular distance and cater for maximal variance?

Question

The question is

Why PCA needs to consider perpendicular distance?
Why PCA needs to cater for maximal variance?

For Question (1), from this link:

It shows that OLS is to minimize the distance from the model and the dependent,e.g.,

while PCA is to minimize the perpendicular distance from the model and the PCA model line,e.g.,

Why PCA needs to consider perpendicular distance?

For Question (2), from this link, it mentions that the objective of PCA wants to achieve two objectives:

The minimum error
The maximum variance

My second question is why we need maximum variance? Does this objective have anything to do with the previous question on perpendicular distance?

`why we need maximum variance` Just we put it is the goal. Why needs to consider perpendicular distance? It is that same goal (hint keywords: rotation; pythagorean theorem). — ttnphns, Nov 06 '17 at 08:55

score 4 · Answer 1 · answered Nov 06 '17 at 08:37

One way of stating the goal of PCA is to find the linear projection $W$ that gives you the "best" representation of your data for a given dimensionality. It defines "best" by the representation with the minimal squared reconstruction error.

When looking at PCA from 2 dimensions to 1 dimension, as you do there, you are not actually trying to find the line that best predicts $y$ from $x$. Rather, you're trying to find the combination of $y$ and $x$ such that the new, combined value "best" represents all your initial 2-D points.

Essentially, the reason PCA considers the perpendicular distance is because it doesn't actually try to model $y$ as a function of $x$.

Why PCA needs to consider **perpendicular** distance and cater for maximal variance?

1 Answers1

Why PCA needs to consider perpendicular distance and cater for maximal variance?