5

Looking at both the practice of colleagues and also the practices instantiated in popular programs (e.g. SPSS, and commonly used syntax for SPSS), it seems common to use criteria based on a PCA to select the number of factors in a factor analysis.

I am not just talking here about the Kaiser-Guttman rule and scree plots but also better-regarded methods such parallel analysis, and the MAP test.

Would it not make more sense to conduct parallel analysis and the MAP test on factors, if the goal is to select the number of factors?

Is this a major problem, or is it basically OK to use the criteria from PCA as part of a guide to selecting the number of factors?

  • 1
    I too am confused why PCA criteria are being used everywhere in factor analysis. It seems that factor analysis has very natural criteria to analyze, e.g. the mean uniqueness (a form of mean squared error), which is the same as 1 minus the common var explained. Plotting this as a function of the number of factors leads to a production possibilities frontier, and then arguments regarding tradeoffs between explanatory power and the number of factors can be meaningfully discussed. (relatedly, I'm not sure why PCA scree plots use eigenvectors rather than the cumulative sum of eigenvectors) – Richard DiSalvo Nov 05 '20 at 03:42
  • 2
    You can do parallel analysis on FA results as well. For instance, the [psych](https://cran.r-project.org/web/packages/psych/index.html) R package offers both ways (PCA and/or FA). It's just that even as a rough approximation PCA works quite well in this case. If you want to learn more about the subtle distinction between those two approaches, look for some of [@ttnphns](https://stats.stackexchange.com/users/3277/ttnphns)'s nice answers related to this topic. – chl Nov 05 '20 at 07:47
  • @chl ok i did some reading & now I'm thinking, the scree plot of eigenvalues for PCA corresponds to the marginal gain in explained variance from the addition of the factor (component). but this just happens to be so for PCA. the most sensible thing (maybe) to drop into this role for factor analysis would then be the marginal increase in the variance explained by the factors (equivalent to -- and to me more intuitively phrased as since it looks like MSE -- the marginal decline in the sum of the uniquenesses i.e. variances of the errors from the addition of the factor). any thoughts appreciated! – Richard DiSalvo Nov 05 '20 at 21:58
  • 1
    It's worth recognizing that FA & PCA perform the same arithmetic over a nearly identical matrix. So while the theory is quite different, and the interpretations are supposed to be different, the output isn't necessarily all that different in practice. As a result, it may be 'good enough for government work'. – gung - Reinstate Monica Nov 06 '20 at 20:32
  • 1
    A related thread https://stats.stackexchange.com/q/241032/3277, especially look in the comments. – ttnphns Nov 07 '20 at 02:25
  • As I've already mentioned there (see), it is perfectly logical to make the initial estimate of the number of factors in EFA by analysing the eigenvalues of PCA. Because PCA is a descriptive method yet akin to EFA. We need a _descriptive_ method giving us all existing eigenvalues of the total variance in order to correctly appreciate such "rules" as Kaiser's and Cattell's, and, to an extent, the parralel analysis. Eigenvalues of EFA are already the result of _modeling_. – ttnphns Nov 07 '20 at 02:55
  • (cont.) But the goal of the modeling is _to restore correlations_ by the m extracted factors, and not quite to give maximized eigenvalues or a nice scree of those. Since the factor extraction has been done, the factor variances are only of modest value in judging if m is optimal or not. – ttnphns Nov 07 '20 at 02:55
  • (cont.) So, @RichardDiSalvo, the (or at least one) objection to your idea to base the initial (tentative) choice of m on the eigevalues or on variances found in EFA is that the main goal of EFA is not maximizing explained variance - unlike PCA. In PCA, the more PCs you extract the more variance you explain. In FA, you have to find an optimal number of factors m and not m+1, even if m+1 may still enhance the prediction of correlations (e.g. due to overfitting). – ttnphns Nov 07 '20 at 03:24
  • @gung i don't think the differences are important with many items, but not sure about few items. in my work with 185 items factored into around 5 factors, scree (eigenvalue) plot and the marginal gain in var explained plot (based on a bunch of ML factor models) are nearly identical. i think on this site we have simulation/theoretical evidence that the "with many items this is not important" is true in general https://stats.stackexchange.com/questions/123063/is-there-any-good-reason-to-use-pca-instead-of-efa-also-can-pca-be-a-substitut – Richard DiSalvo Nov 09 '20 at 20:40
  • @ttnphns I understand that with a lot of items the differences aren't likely to matter, but I don't I see why gain in explained variance is a bad plot to drop in instead of the eigenvalue plot (when using factor). I think you might be suggesting that there are other analyses in addition to the eigenvalue (or similar) plot for determining number of factors? That makes sense. But you said "the factor variances are only of modest value in judging if m is optimal or not"; if so, and since these will be related to the eigenvalues in some cases, how should we guess at a reasonable number of factors? – Richard DiSalvo Nov 09 '20 at 20:48
  • 1
    @RichardDiSalvo, Throughout my numerous comments to this current as well as the other Q, I expressed reasons for my feeling that preliminary criteria (1) Kaiser's, (2) Cattell's scree, and possibly (3) Parallel analysis - should be based on PCA rather than on EFA itself. My _main_ reasons were two: (i) eigenvalues or explained variances corresponding to the m+1, m+2... factors are not "real" or "existing" because only m factors were modeled in the FA; – ttnphns Nov 09 '20 at 22:08
  • 1
    (cont.) (ii) in FA itself, the most genuine criterion for the value of m would be the quality of reconstruction of the off-diagonal elements of the correlation (covariance) matrix, and not the amount of variance the factors explain. That were my main points. – ttnphns Nov 09 '20 at 22:08
  • A question about the [Cattell's scree-plot criterion](https://stats.stackexchange.com/q/513911/3277) – ttnphns Mar 17 '21 at 18:22

1 Answers1

3

It is basically OK to use the criteria from PCA as part of a guide to selecting the number of factors. Most of the times, FA and PCA results will be in agreement.

Parallel analysis and Velicer's minimum average partial (MAP) are the most reliable and accurate techniques to assess the number of components or factors to retain, according to Zwick & Velicer.1 The fact that we use PCA instead of FA is motivated by historical reasons, and was more or less disputed in the last 20 years. Most research has focused on PCA but Velicer and coll. discussed some of the adaptations that were developed for principal axes factor analysis, which is widely used in factor analysis, and they concluded that "no satisfactory alternatives are available within factor analysis."2

Sidenote:

Horn's parallel analysis aims to provide an idea of the distribution of eigenvalues on randomly perturbed observations, from the same dataset, hence it gives an idea of the "residual" variance in your data, i.e. what would be expected if there were no signal in your dataset. Despite being rarely used in practice, this resampling-based approach is certainly a better criterion than Kayser's rule (eigenvalues > 1). Other aspects of parallel analysis are discussed in this related thread: How to correctly interpret a parallel analysis in exploratory factor analysis?. Some statistical packages, like paran available in R and Stata, even offer a way to correct for bias arising from colinearity due to sampling error (which, of course, is not accounted for when using Kayser's rule), whereby eigenvalues of PCA components might be greater than and less than 1, and other subtleties, like how to generate random samples (e.g., standardized values, using same rank for the simulated dataset as the observed response matrix, or drawing random variates from a distribution close to the empirical one).3 Many of Velicer's papers and recent publications on parallel analysis contrast PCA and FA approaches in determining the number of factors to retain, so you will likely find additional information on how un-rotated PCA and FA models are able to recover the proper factor structure of a given dataset.

Replying to @RichardDiSalvo, I believe the direction (marginal increase or decrease of explained variance, using your words) is not that relevant since in both cases you started with one factor (or component), then add more, and see how the residual variance is accounted for by any of those models. This applies both from a theoretical (start with the minimalistic model, then add complexity) and a computational (this is an iterative procedure where we start with the null model, and add parameters one after the other) point of view. The uniqueness (1 - communality) is specific to EFA, and ideally should be identical in all factor solutions --- that is, all items should have close loadings on their respective factors. That's the basis of most of classical test theory (tau-equivalent and congeneric measures are examples of models where we relax this assumption).

References

1 Zwick, W.R. and Velicer, W.F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432–442.

2 Velicer, W. F., Eaton, C. A., & Fava, J. L. Construct explication through factor or component analysis: A review and evaluation of alternative procedures for determining the number of factors or components. In R. D. Goffen & E. Helms (Eds.), Problems and Solutions in Human Assessment--Honoring Douglas N. Jackson at Seventy (pp. 41–71) (Springer, 2000).

3 Liu, O.L. and Rijmen, F. (2008). A modified procedure for parallel analysis of ordered categorical data. Behavior Research Methods, 40 (2), 556-562.

chl
  • 50,972
  • 18
  • 205
  • 364