Why do PCA and Factor Analysis return different results in this example?

Question

The following question is about an Exercise 14.15 from "The Elements of Statistical Learning" by Hastie, Friedman and Tibshirani.

Generate $200$ observations of three variates $X_1, X_2 , X_3$ according to \begin{align}X_1 &= Z_1 \\ X_2 &= X_1 + 0.001 \cdot Z_2 \\ X_3 &= 10 \cdot Z_3 \end{align} where $ Z_1, Z_2, Z_3 $ are independent standard normal variables. Compute the leading principal component and factor analysis directions. Hence show that the leading principal component aligns itself in the maximal variance direction $X_3$, while the leading factor essentially ignores the uncorrelated component $X_3$, and picks up the correlated component $X_2 + X_1$ (Geoffrey Hinton, personal communication).

Why? I thought that they are both "powered by" the same matrix decomposition? What have I missed?

The principal reason in this particular example is (I'm sure) the PCA was based on covariances while FA was based on correlations, - the decomposed matrices were different. This looks like a bad example to really understand differences between PCA and FA. — ttnphns, Nov 14 '13 at 08:52
@ttnphns Have you got a good example? (This difference is a tricky one for some people to understand; I have had clients who couldn't get it, and, usually, the two methods do give similar results). — Peter Flom, Nov 14 '13 at 11:18
Not now, @Peter. But, sure, I know you can ever suggest an exercise example at least not worse than I can. The question, however, is what the OP really bothered with, what does they really want to grasp? — ttnphns, Nov 14 '13 at 11:40
@PeterFlom this paper has an example: http://www.ophi.org.uk/wp-content/uploads/Widaman-1993.pdf — Jeremy Miles, Nov 14 '13 at 17:07
Thanks @JeremyMiles ! I vaguely remember that paper from back when I was in grad school. I will take a look — Peter Flom, Nov 14 '13 at 17:14
@ttnphns: I think in this particular example both PCA and FA are supposed to be based on covariances. I have posted an answer trying to elaborate on that. — amoeba, Jan 26 '15 at 00:52
Hello, @power! Have you had a chance to look at my answer? Please do let me know if have any further questions about it. — amoeba, Feb 10 '15 at 14:32
Hello @amoeba, I have looked at it superficially. I'll make time this weekend to really think about it. Thank you for your help. I have up-voted. — power, Feb 11 '15 at 05:39
A reminder one year later (after having received an upvote today) :-) Let me know if anything remains unclear. — amoeba, May 22 '16 at 22:15

score 6 · Answer 1 · edited Apr 13 '17 at 12:44

The covariance matrix in this example is given by $$\mathbf C = \left(\begin{array}{c} 1 & \sim 1 & 0 \\ \sim 1 & \sim 1 & 0 \\ 0 & 0 & 100\end{array}\right).$$

To compare PCA and FA, think about how PCA/FA loadings reconstruct the covariance matrix.

The loadings of the first principal component in PCA is a vector $\mathbf v$ that minimizes the reconstruction error $\|\mathbf C - \mathbf v \mathbf v^\top \|$. As is well-known, it is given by the leading eigenvector of $\mathbf C$ scaled by a square root of its eigenvalue, and in this case will be pointing in the $(0,0,1)$ direction (in order to reproduce the covariance of $X_3$ which would otherwise be a major source of reconstruction error).

In contrast, the loadings of the first factor in FA is a vector $\mathbf v$ that minimizes the reconstruction error $\|\mathbf C - \mathbf v \mathbf v^\top - \boldsymbol \Psi \|$, where $\boldsymbol \Psi$ is a diagonal matrix of uniquenesses. This is equivalent to saying that it minimizes the reconstruction error $\|\mathrm{offdiag}\{\mathbf C - \mathbf v \mathbf v^\top\}\|$, i.e. FA does not care about reconstructing the diagonal. Think about $\mathbf C$ with erased diagonal:$$\mathrm{offdiag}\{\mathbf C\}=\left(\begin{array}{c} & \sim 1 & 0 \\ \sim 1 & & 0 \\ 0 & 0 & \end{array}\right).$$ The goal of FA is to reconstruct this part of $\mathbf C$ and so the loadings of the first factor will be pointing in the $(1,1,0)$ direction, in order to reproduce this off-diagonal covariance between $X_1$ and $X_2$.

Note that this analysis is based on the covariance matrix. Conducting an analysis based on the correlation matrix would (in this case) lead both PCA and FA to yield similar outcomes.

My answer to the opposite question might be of interest:

Under which conditions do PCA and FA yield similar results?

For many more details about PCA vs FA issue, see my [very long] answer to this question:

Is there any good reason to use PCA instead of EFA? Also, can PCA be a substitute for factor analysis?

Why do PCA and Factor Analysis return different results in this example?

1 Answers1

Linked