Why does PCA's failure "to explicitly model error variance" make it difficult to interpret components?

Question

I've heard statements like this many times over the years, and it's perhaps expressed most clearly by Preacher & MacCullum (2003), which is a popular paper on stats.stackexchange.com (e.g. mentioned twice in this question thread). Preacher & MacCullum write on p20 that

PCA does not explicitly model error variance, which renders substantive interpretation of components problematic. This is a problem that was recognized over 60 years ago (Cureton, 1939; Thurstone, 1935; Wilson & Worcester, 1939; Wolfle, 1940), but misunderstandings of the significance of this basic difference between PCA and EFA still persist in the literature.

I could not find all these old papers, but the Wilson and Worcester (1939) one did now allow me to reach a clear conclusion about why failing to explicitly model error variance should make the substantive interpretation of components problematic.

Cureton, E. E. (1939). The principal compulsions of factor analysts. Harvard Educational Review, 9, 287-295.
Preacher, K. J., & MacCallum, R. C. (2003). Repairing Tom Swift's electric factor analysis machine. Understanding statistics: Statistical issues in psychology, education, and the social sciences, 2(1), 13-43.
Thurstone, L. L. (1940). Current issues in factor analysis. Psychological Bulletin, 37(4), 189.
Wilson, E. B., & Worcester, J. (1939). Note on factor analysis. Psychometrika, 4(2), 133-148. Chicago.
Wolfle, D. (1940). Factor analysis to 1940. Psychometric Monographs.

"EFA vs PCA" is a very extensively discussed topic; and on this site, too. Read threads http://stats.stackexchange.com/q/1576/3277, http://stats.stackexchange.com/q/123063/3277, http://stats.stackexchange.com/q/94048/3277, and other (inspect links in the comments there). Issue of "interpretation" is touched e.g. in http://stats.stackexchange.com/a/123089/3277. — ttnphns, Mar 15 '17 at 08:48
PCA is different from FA, but IMHO this particular distinction is a red herring. PCA "does not explicitly model error variance" means that PCA is not a probabilistic model. That's true in its standard formulation, but PPCA *is* a probabilistic model, and is mathematically equivalent to PCA. I wrote a lot about that in my answer here http://stats.stackexchange.com/questions/123063. Looking now in the Preacher & McCallum, I see a footnote on the same page claiming that in FA "models are testable" whereas in PCA "they are not". Again, wrong/misleading: one can test PPCA as well as FA. — amoeba, Mar 15 '17 at 09:21
user1205901, FA not only models error variance, it actually models the correlation (or covariance matrix), more directly or less directly - depending on the extraction method. — ttnphns, Mar 15 '17 at 09:32
@amoeba, Btw, can you suggest me (a lazy one) a ready-made pseudocode(s) or understandible code or clearly described algorithm of PPCA? I'd wish maybe to rewrite it into SPSS syntax, w/o "inventing" it from scratch. Can you? Thanks. — ttnphns, Mar 15 '17 at 09:39
@ttnphns: PPCA can be implemented via EM algorithm, similar to FA. However, one can prove that the final solution (that the EM algorithm will converge to) can be expressed via standard PCA. So there is no need to implement the EM algorithm, one can use PCA as a shortcut. In this case, there is almost nothing to implement, just a couple of formulas. So what approach do you want to use in SPSS? — amoeba, Mar 15 '17 at 13:55
@amoeba, The via-PCA (easier-to-be) one. Since you say they are equivalent by result. — ttnphns, Mar 15 '17 at 14:05
@ttnphns Take a look at [Tipping & Bishop 1999](http://www.di.ens.fr/~fbach/courses/fall2010/Bishop_Tipping_1999_Probabilistic_PCA.pdf). You don't need to read the whole paper, 1 page is enough! Look at the notation in Eq (2): $W$ are loadings, and $\sigma^2$ is "shared uniqueness" (so $\sigma^2 I$ plays the same role as $\Psi$ in factor analysis). You want to obtain $W$ and $\sigma^2$. Solution for $\sigma^2$ is given by Eq (8): it's the mean of all "left out" PCA eigenvalues. Solution for $W$ is given by Eq (7): it's PCA eigenvectors scaled by square roots of eigenvalues minus $\sigma^2$. — amoeba, Mar 15 '17 at 20:45

Why does PCA's failure "to explicitly model error variance" make it difficult to interpret components?

0 Answers0