0

I have a big dataset with 30000 rows and 60 columns (variables).

I would like to implement factor analysis aiming to find factors between variables.

What kind of factor analysis and what measures I should check to find if the factors are fine?

user8831872
  • 101
  • 3
  • 2
    Possible duplicate of [Doing principal component analysis or factor analysis on binary data](https://stats.stackexchange.com/questions/16331/doing-principal-component-analysis-or-factor-analysis-on-binary-data). See also: https://stats.stackexchange.com/questions/215404/is-there-factor-analysis-or-pca-for-ordinal-or-binary-data – T.E.G. Nov 20 '17 at 14:00
  • Both links should be helpful (and there are other similar questions). The answers to those questions are general, not software specific (only mention them in possible ways of implementation), since questions focusing on *programming, debugging, or performing routine operations within a statistical computing platform* are [off-topic](http://stats.stackexchange.com/help/on-topic) here. – T.E.G. Nov 20 '17 at 14:47

1 Answers1

2

For factor analysis of dichotomous data you should use tetrachoric correlations.

The fa() function in the psych package allows you to specify that you want to factor analyze tetrachoric (or other types) of correlation.

Jeremy Miles
  • 13,917
  • 6
  • 30
  • 64
  • 2
    PCA and factor analysis are more or less insensitive to the distribution of the *data* because the mathematical object they analyze is the *correlation matrix* (or covariance matrix, when variances are approximately equal, which means the covariance matrix is close to the correlation matrix). Dinno, A. (2009). Exploring the Sensitivity of Horn’s Parallel Analysis to the Distributional Form of Simulated Data. *Multivariate Behavioral Research*, 44(3):362–388. – Alexis Nov 21 '17 at 03:36