4

Common factor analysis entails the eigendecomposition and subsequent interpretation of $\mathbf{C}$ which is often described the correlation matrix with the diagonal elements replaced with approximate "communalities"; $\mathbf{C} = \left(\mathbf{R} - \text{diag}(\mathbf{R}^{+})^{+}\right)$, where $\mathbf{A}^{+}$ indicates the Moore-Penrose inverse of matrix $\mathbf{A}$, possibly with subsequent rotations.

Is common factor analysis ever based on the eigendecomposition of a transformation of the covariance matrix, rather than the correlation matrix? (And I don't mean by first transforming the covariance matrix into the correlation matrix.)

Alexis
  • 26,219
  • 5
  • 78
  • 131
  • As far as I know, there is no exact analytical expression for communalities; the formula that you provided is only an approximation used to initialize an iterative algorithm of factor analysis. Is that what you mean? Also, I am not sure that your formula is correct; communalities are approximated by multiple correlation coefficients which are given by $1-\mathrm{diag}(\mathbf{R}^+)^+$. To start the iterations, one subtracts them from the diagonal of $\mathbf{R}$, obtaining on the diagonal $\mathrm{diag}(\mathbf{R})-1+\mathrm{diag}(\mathbf{R}^+)^+$. Your expression is neither this not that. – amoeba May 01 '14 at 20:53
  • 2
    I might redirect you to a long and amusing discussion (especially comments) about PCA http://stats.stackexchange.com/q/62677/3277. In this respect, it doesn't matter that your question is about factor analysis, not PCA: the "dilemma" is the same. – ttnphns May 01 '14 at 21:03
  • 1
    @Alexis, very briefly my answer would be: (1) Yes, FA is enough often done on covariances; (2) This analysis is neither better nor worse than that on correlations - it is another analysis; (3) Programic implementation of it is sometimes (for some extraction methods, I mean) a bit more cumbersome. – ttnphns May 01 '14 at 21:12
  • @amoeba: I wasn't asking about iterated factor algorithms. And I am fairly comfortable with my representation: the $1$ in your first expression corresponds precisely with the diagonal of the correlation matrix in mine. – Alexis May 01 '14 at 22:19
  • @ttnphns I don't see what your answer (2) was directed at. I was not asking about better or worse, but simply about the applications to the covariance matrix. – Alexis May 01 '14 at 22:20
  • @Alexis: good point, $\mathbf{R}$ has ones on the diagonal. So your formula for the initial guess of communalities is correct. But your statement that correlation matrix gets communalities on the diagonal is wrong (instead, one subtracts them from the diagonal), and not mentioning that this is only an **approximation** of the communalities and that one needs to perform the iterations until convergence is badly misleading, don't you think? – amoeba May 01 '14 at 23:01
  • I think the point about approximation is good, but I am afraid that I have seen "communalities" refer to the diagonal of the matrix that is decomposed, not to the quantities subtracted from the diagonal of the correlation matrix. – Alexis May 01 '14 at 23:05
  • @Alexis: yes, you right. I confused "communalities" with "uniquenesses" (which are $1-C$, hence the confusion). Sorry for that :-/ Still, regarding approximation/iterations: at the moment you describe the FA algorithm as (a) find approximate communalities, (b) put them on the diagonal, (c) eigendecompose to find loadings. I thought this would give very inaccurate results; instead, one needs to iterate until convergence. Is it not the case? – amoeba May 02 '14 at 09:08
  • No: what you describe is *iterated* principal factors, and (however correct your point about it's use) it is not always used. – Alexis May 02 '14 at 13:25

2 Answers2

2

Yes, if the original variables have comparable scales, there is no reason to use the correlation matrix. Using the covariance matrix avoids a non-linear operation (dividing each covariance by the product of standard deviations), which tends to complicate the theory.

F. Tusell
  • 7,733
  • 19
  • 34
  • OK. Digging deeper into this direction: if the original data have comparable scales, and variances that are nearly equal, then the $\mathbf{\Sigma} \approx \mathbf{R}$. If the variances in the observed data are not nearly equal, then they are often transformed (in some disciplines, anyway) so that again $\mathbf{\Sigma} \approx \mathbf{R}$. Are you saying that the covariance matrix is used for comparably scaled-data, even when the variances of the data are different? – Alexis May 01 '14 at 19:51
  • Also, can you clarify if when the covariance matrix is used, if $\mathbf{\Sigma}$ is simply substituted for $\mathbf{R}$ in the above definition of $\mathbf{C}$? – Alexis May 01 '14 at 19:52
  • The covariance matrix is used when the scales are comparable and variances not much different, and yes, you would simply substitute $\boldsymbol{\Sigma}$ for $\boldsymbol{R}$. – F. Tusell May 02 '14 at 07:26
2

I never worked with factor analysis, but your question can be asked about PCA as well, and as @ttnphns commented above, the choice between covariance and correlation matrices is exactly the same there.

It is clear (and already mentioned here) that if the original variables are measured in different and incomparable units, the only reasonable choice is to use correlation matrix. If the units are the same, but the original variances are very similar, then it does not matter which matrix to use as they are nearly proportional. So the real question is what to do in a situation when all variables are measuring the same quantity in the same units, but have very different variances.

Let me give you an example (the one I am working with on a daily basis) when the covariance matrix makes more sense. Each variable is an activity of one neuron in the brain. It is measured at different points in time and perhaps in different experimental conditions. Many neurons are recorded simultaneously, so a dataset can encompass e.g. 1000 neurons. PCA can be used to perform a dimensionality reduction.

Each variable is a firing rate (number of spikes per second). But some neurons fire more and some fire less; some change their firing rate more and some less. So individual variances can be very different (we are talking several orders of magnitude). These differences are clearly "important": a neuron that fires more (and changes its firing a lot) is presumably more involved in the task in question than a neuron that almost does not fire at all. They are arguably not "equal", and so it makes sense to use covariance matrix directly.

A related issue with correlation matrix is that if a neuron is almost silent and has a tiny variance, one would divide by almost zero and greatly amplify what is probably just noise. Getting rid of such neurons (in order to work with correlation matrix) would be another preprocessing step that would only lead to further complications.

Update. Related discussion: PCA on correlation or covariance?

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • 4
    I think your reasoning is good, but one conclusion may be a little too limited. Yes, when the original variables are not commensurable, the result might only reflect arbitrary choices of units of measurement of the variables. But this does not imply that using the correlation is the "only reasonable choice." In fact, in many cases that might be an *unreasonable* choice due to its sensitivity to outlying data. One might choose to standardize the variables to a unit IQR, for instance, or to use some other basis to identify appropriate units of measurement--or even to re-express data nonlinearly. – whuber May 07 '14 at 14:46