3

Problem (PCA):

Assume that p = 2 and the the predictors are centered. Show that the sum of squared perpendicular distances from ($x_{i1}, x_{i2})$, i = 1, 2, . . . , n to the line $a_{2}x_{1}−a_{1}x_{2}=0$ with a $a_1^2 + a_2^2 = 1$ is minimized when $a = φ_{11}$ and $b = φ_{21}$, where $φ_1 = (φ_{11}, φ_{21})$ is a norm-one eigenvector associated with the eigenvalue $λ_1$.

I know that maximizing the variance along the principal component is equivalent to minimizing the reconstruction error,i.e., the sum of the squares of the perpendicular distance from the component. I don't know how can I integrate this idea here. I was wondering if you could give me some hints to solve this problem.

  • Here, $\lambda_1$ should be the minimum eigenvalue of the data covariance matrix. Can you please verify it? Because, the sum of squared errors is minimised along the first eigenvector's direction, but that line is not represented by the first eigenvector but the second. A line $ax+by=0$ has the vector $[a,b]$ as its *normal* (it's not parallel). – gunes Dec 05 '20 at 14:38
  • @gunes How can I verify this? – Simpson's Paradox Dec 05 '20 at 14:59
  • Checking the source of the question maybe? Is it a book? The question doesn’t specify exactly what $\lambda_1$ mean. – gunes Dec 05 '20 at 15:06
  • 1
    By using the information in my post at https://stats.stackexchange.com/a/71303/919 you should have no trouble obtaining this result. – whuber Dec 05 '20 at 15:16
  • @gunes It is a principal component analysis related exercise. It is not from the book. – Simpson's Paradox Dec 05 '20 at 16:58
  • What is exactly $\lambda_1$? – gunes Dec 05 '20 at 17:02
  • @gunes my best guess, Proportion of variance explained: The PC $Z_k$ has variance $λ_k$ and the total variance is $\sum _{k=1}^P λ_k$ – Simpson's Paradox Dec 05 '20 at 17:10
  • To your best guess, is $\lambda_1$ the maximum or minimum eigenvalue? – gunes Dec 05 '20 at 17:13
  • I think it's a maximum eigen value. – Simpson's Paradox Dec 05 '20 at 17:17
  • That's what I miss to understand. If $[a,b]$ is the first eigenvector, the line that's parallel to it will be $-bx+ay=0$ in $xy$ plane, not $ax+by=0$. – gunes Dec 05 '20 at 17:18
  • @gunes What I have from book is that, When p = 2, the first PC defines the line that is as close to as possible to the data — in the sense of minimizing SS of perpendicular distances between each point and the line. Projecting observations on this line gives the scores on the 1st PC, which represents the best 1-D summary of the data. – Simpson's Paradox Dec 05 '20 at 17:19
  • You might find the animated plots at https://stats.stackexchange.com/a/140579/919 to be helpful. – whuber Dec 05 '20 at 17:22
  • @whuber I know how PCA works in the big picture. I just couldn't make sense of the above problem. I appreciate your link. – Simpson's Paradox Dec 05 '20 at 17:24
  • 1
    @gunes I have got some corrections. The line equation is $a_{2}x_1 − a_{1}x_2 = 0$ – Simpson's Paradox Dec 07 '20 at 01:20
  • @Simpson'sParadox now it makes more sense to me. Can you also change the original post. – gunes Dec 07 '20 at 09:17
  • @gunes Thank you so much! I modified the question. – Simpson's Paradox Dec 07 '20 at 14:26

1 Answers1

1

Sum of the squared distances from $x_i$' to the line is $$\sum_i(a_2x_{i1}-a_1x_{i2})^2=\sum_{i}(a_p^Tx_i)^2=\sum_i a_p^Tx_ix_i^Ta_p=a_p^T\left(\sum_i x_ix_i^T \right)a_p$$

The inside of the parentheses is the scatter matrix (i.e. scaled sampled covariance when predictors are centered, i.e. $X^TX$, call it as $S$). Also, $a_p=[a_2 \ -a_1]^T$ is the perpendicular vector to $a=[a_1\ a_2]^T$. The expression turns into quadratic form as below:

$$\min_{a_p \text{s.t.} ||a_p||=1} a_p^T Sa_p$$

This problem has solution where $a_p$ is the eigenvector corresponding to the minimum eigenvalue of $S$ (call it $\lambda_2$). Then, $a$ will be the eigenvector corresponding to the maximum eigenvalue, $\lambda_1$.

Note: Scatter matrix and Sample covariance matrix has only a scalar multiplication difference. So, the eigenvalues of these matrices will be only scaled versions of each other, and eigenvectors will be the same.

gunes
  • 49,700
  • 3
  • 39
  • 75