Minimize the sum of squared perpendicular distances while computing PCA

Question

Problem (PCA):

Assume that p = 2 and the the predictors are centered. Show that the sum of squared perpendicular distances from ($x_{i1}, x_{i2})$, i = 1, 2, . . . , n to the line $a_{2}x_{1}−a_{1}x_{2}=0$ with a $a_1^2 + a_2^2 = 1$ is minimized when $a = φ_{11}$ and $b = φ_{21}$, where $φ_1 = (φ_{11}, φ_{21})$ is a norm-one eigenvector associated with the eigenvalue $λ_1$.

I know that maximizing the variance along the principal component is equivalent to minimizing the reconstruction error,i.e., the sum of the squares of the perpendicular distance from the component. I don't know how can I integrate this idea here. I was wondering if you could give me some hints to solve this problem.

Here, $\lambda_1$ should be the minimum eigenvalue of the data covariance matrix. Can you please verify it? Because, the sum of squared errors is minimised along the first eigenvector's direction, but that line is not represented by the first eigenvector but the second. A line $ax+by=0$ has the vector $[a,b]$ as its *normal* (it's not parallel). — gunes, Dec 05 '20 at 14:38
Checking the source of the question maybe? Is it a book? The question doesn’t specify exactly what $\lambda_1$ mean. — gunes, Dec 05 '20 at 15:06
By using the information in my post at https://stats.stackexchange.com/a/71303/919 you should have no trouble obtaining this result. — whuber, Dec 05 '20 at 15:16
@gunes It is a principal component analysis related exercise. It is not from the book. — Simpson's Paradox, Dec 05 '20 at 16:58
@gunes my best guess, Proportion of variance explained: The PC $Z_k$ has variance $λ_k$ and the total variance is $\sum _{k=1}^P λ_k$ — Simpson's Paradox, Dec 05 '20 at 17:10
To your best guess, is $\lambda_1$ the maximum or minimum eigenvalue? — gunes, Dec 05 '20 at 17:13
That's what I miss to understand. If $[a,b]$ is the first eigenvector, the line that's parallel to it will be $-bx+ay=0$ in $xy$ plane, not $ax+by=0$. — gunes, Dec 05 '20 at 17:18
@gunes What I have from book is that, When p = 2, the first PC defines the line that is as close to as possible to the data — in the sense of minimizing SS of perpendicular distances between each point and the line. Projecting observations on this line gives the scores on the 1st PC, which represents the best 1-D summary of the data. — Simpson's Paradox, Dec 05 '20 at 17:19
You might find the animated plots at https://stats.stackexchange.com/a/140579/919 to be helpful. — whuber, Dec 05 '20 at 17:22
@whuber I know how PCA works in the big picture. I just couldn't make sense of the above problem. I appreciate your link. — Simpson's Paradox, Dec 05 '20 at 17:24
@gunes I have got some corrections. The line equation is $a_{2}x_1 − a_{1}x_2 = 0$ — Simpson's Paradox, Dec 07 '20 at 01:20
@Simpson'sParadox now it makes more sense to me. Can you also change the original post. — gunes, Dec 07 '20 at 09:17

gunes · Accepted Answer · 2020-12-07T09:21:57.230

1

Sum of the squared distances from $x_i$' to the line is $$\sum_i(a_2x_{i1}-a_1x_{i2})^2=\sum_{i}(a_p^Tx_i)^2=\sum_i a_p^Tx_ix_i^Ta_p=a_p^T\left(\sum_i x_ix_i^T \right)a_p$$

The inside of the parentheses is the scatter matrix (i.e. scaled sampled covariance when predictors are centered, i.e. $X^TX$, call it as $S$). Also, $a_p=[a_2 \ -a_1]^T$ is the perpendicular vector to $a=[a_1\ a_2]^T$. The expression turns into quadratic form as below:

$$\min_{a_p \text{s.t.} ||a_p||=1} a_p^T Sa_p$$

This problem has solution where $a_p$ is the eigenvector corresponding to the minimum eigenvalue of $S$ (call it $\lambda_2$). Then, $a$ will be the eigenvector corresponding to the maximum eigenvalue, $\lambda_1$.

Note: Scatter matrix and Sample covariance matrix has only a scalar multiplication difference. So, the eigenvalues of these matrices will be only scaled versions of each other, and eigenvectors will be the same.

edited Dec 07 '20 at 09:21

answered Dec 07 '20 at 09:16

gunes

49,700
3
39
75

Thank you so much! One more thing, Is it related to total least squares? I would like to learn more about this. – Simpson's Paradox Dec 07 '20 at 14:24
Although I don't know the explicit connection, they should be. – gunes Dec 07 '20 at 14:33
It's okay! I really appreciate your effort! Stay safe! – Simpson's Paradox Dec 07 '20 at 14:34

Minimize the sum of squared perpendicular distances while computing PCA

1 Answers1