I have cross-sectional survey data, with over 4000 observations on 41 variables, and there are no missing values. Variables are not normally distributed and assumptions of multinormal distribution are violated. Within R I am using FA with varimax rotation to find 5 latent variables within these 41 variables. Five factors is a theory-based assumption. I plan on doing EFA on a smaller random subset of the 4000 observations, and then fit this model with all the 4000 observations via CFA. 5-factor varimax solution with OLS has some crossloadings. TLI is .889 but 90 % RMSEA CFI is .035 - .038. The fifth factor only has two loadings, both under .4. I used a cut-off point of .3.
I shouldn't use ML estimation, because it requires normal distribution, apparently. According to lavaan my options are GLS, WLS, DWLS or ULS. Diagonally weighted least squares & unweighted least squares estimators have robust variants as well. Which estimator should I use? Despite reading literature I am still unfamiliar with robustness. Why would I use less-robust estimators when there are robust estimators available? Should I use the same estimators and rotation in my planned FA for the smaller subset, and the whole sample?