2

Thanks to @amoeba, I learned that standardized eigenvectors are sometimes calculated, i.e., eigenvectors are divided by the square root of their eigenvalues.

Now, when I want to do principal component regression (PCR), I first calculate the components that I subsequently use in regression. There are two procedures that can be used:

  1. I calculate my components once as raw components, that is I do not standardize the eigenvectors.

  2. I calculate my components as components based on standardized eigenvectors.

Now, I do PCR once with the components in situation 1, and once with the components in situation 2.

My questions:

  • What are the implications (a) for the coefficients, (b) for the interpretation in the two cases?
  • What are the main differences?
  • I know that the variance of the components will differ, but how does that change the behaviour in regression, or does it at all?
  • Is there any recommendation to whether I should use the one but not the other?
amoeba
  • 93,463
  • 28
  • 275
  • 317
MaHo
  • 391
  • 2
  • 11
  • 2
    maybe you will want to link to where amoeba has dropped the idea you are starting from? Because your question is unclear without that context. – ttnphns Apr 10 '15 at 13:45
  • I tried to make it clearer. She didn't really drop the idea, rather uncovered my knowledge lacks. – MaHo Apr 10 '15 at 17:58
  • In one case the components are standardized, in the other not. –  Apr 12 '15 at 07:48
  • PCA eigenvectors can be multiplied (not divided!) by the square roots of the eigenvalues to obtain *loadings*. In principal component regression one uses principal components, i.e. projections on the eigenvectors, and not eigenvectors themselves, so I am not sure what exactly you mean by your approach #2. Perhaps you mean standardized principal components (scaled to unit variance). If so, then notice that any scaling of individual predictors does not influence the regression at all, so the answer to your question is that **it does not matter.** – amoeba Jul 08 '15 at 15:06
  • It's common to express eigenvectors as unit vectors. I don't know what dividing each component by its eigenvalue would mean. – duffymo Jul 08 '15 at 16:29
  • @duffy That would produce a *dual basis* for the eigenbasis. It's a useful construct. See https://stats.stackexchange.com/a/444058/919 for some details. – whuber Jul 09 '21 at 16:01
  • Thank you for pointing it out. I was ignorant. – duffymo Jul 09 '21 at 16:44

2 Answers2

0

I'm reviewing this and found a nice paper Parameter Estimation in Factor Analysis: Maximum Likelihood versus Principal Component which makes me think that the eigenvectors are indeed standardized, if only for convenient interpretation. They're operating on a standardized nxp data matrix, $\mathbf{Z}$, where the variance matrix has ones on the diagonal and thus there are p units of variance (the trace) "to distribute".

For eigenpairs $(\lambda_k, \mathbf{e_k}), k = 1, .., p$, the sum of the eigenvectors is still $p$ and the proportion of the variance is $\lambda_k / p$. To have the nice interpretation of distributing these p units of variance, you'd clearly have to be working with unit eigenvectors (though the paper doesn't specifically say it).

Ben Ogorek
  • 4,629
  • 1
  • 21
  • 41
-1

It might have something to do with whether your predictor variables are standardized or not (centered they must be!). If they are in original form, some eigenvector will be extremely large so standardizing them is a good idea (Why?). I don't see the point in standardizing them if your PCA used standardized (commensurating) variables already.