2

Assume we have a matrix X = randn(5,3). I am doing two things:

1) [S D1 V1] = svd(X);

2) [V2 D2] = eig(X'*X);

I am getting:

V1 =

   -0.6220    0.5046    0.5987
   -0.6549   -0.7544   -0.0446
   -0.4292    0.4198   -0.7997

and

V2 =

    0.5987    0.5046    0.6220
   -0.0446   -0.7544    0.6549
   -0.7997    0.4198    0.4292

First question: How can we interpret the difference between V1 and V2? why some negative values are getting positive and the values are in reverse order?

Second question: in principal component analysis, one can compute the principal components (PCs) as Z = S*D1 or Z = X*V2. But in this case S*D1 is not equal to X*V2 but X*V1. So the PCs are Z = X*V1 not X*V2 right?

amoeba
  • 93,463
  • 28
  • 275
  • 317
Christina
  • 755
  • 6
  • 17
  • 2
    The sign of the components is arbitrary and does not matter, see here: http://stats.stackexchange.com/questions/88880. Regarding the order: Matlab's `eig` function tends to order the eigenvectors in the order of increasing eigenvalues; the `svd` function tends to order them in the decreasing order. Hence the order is flipped. One should never rely on the ordering and re-order the components based on the eigenvalues. You can compute `Z` as `X*V1` or as `X*V2` and you will get **the same thing**, just possibly with different signs and in different order. – amoeba Feb 17 '16 at 14:47
  • Thank you for your comment. In fact by using `eig`, `V2` is probably to be complex if the matrix dimension becomes large, whereas with `svd`, `V1` is always real. How can you interpret this fact? Do you think that computing `Z=X*V1` is more preferable? – Christina Feb 17 '16 at 14:57
  • SVD is numerically more stable and is usually the preferred way, see http://stats.stackexchange.com/a/87536. Complex values indicate some numerical problems along the way; I would guess that the imaginary part is around the machine precision and so you can write `V2=real(V2)` and it's going to be fine. But it's better to use SVD. – amoeba Feb 17 '16 at 15:03
  • So in principal component regression(pcr), one can assume: `Y=X*beta + e = S * D1 * V1' * beta + e = Z * V1' * beta + e= Z * alpha + e`. Since `Z = S*D1 = X*V1`. am I right? thank you very much for your help. – Christina Feb 17 '16 at 15:10
  • Yes. But you would usually use only a few components in PCR, not all of them. – amoeba Feb 17 '16 at 15:13
  • Of course, it will be interesting to discard the least informative components. My last question is: I am wondering why the author of this article entitled: "Use of the Singular Value Decomposition in Regression Analysis" complicated the things in page 5. So why he didn't simply consider that `Z=U*theta` (I am just talking about the article in page 5). It will be interesting if you can integrate your comments as an answer :-) – Christina Feb 17 '16 at 15:18
  • Please give a link to the paper, I am not sure which one you are talking about. – amoeba Feb 17 '16 at 16:48
  • http://www.ime.unicamp.br/~marianar/MI602/material%20extra/svd-regression-analysis.pdf – Christina Feb 17 '16 at 21:13

0 Answers0