4

I have a large amount of variables (24) to predict a Y/N value, and I would like help for writting a procedure that automatically tries the different results of the factor selection to see how good the regression turns out to be, and of course I want to save the best model for later use.

mariana soffer
  • 1,091
  • 2
  • 15
  • 18
  • 1
    You may find helpful those related posts http://stats.stackexchange.com/questions/6856/aggregating-results-from-linear-model-runs-r http://stats.stackexchange.com/questions/1812/fa-choosing-rotation-matrix-based-on-simple-structure-criteria – George Dontas Feb 07 '11 at 07:21
  • Mallows Cp http://en.wikipedia.org/wiki/Mallows'_Cp Here's a paper on Cp http://mlrv.ua.edu/2008/vol34_1/Lieberman-Morris.pdf – bill_080 Feb 07 '11 at 19:23

2 Answers2

4

First of all, consider if the factor analysis is the right way to do feature extraction. I would suggest to use principal component analysis to make dimension reduction first and then use extracted features as predictor variables. Depends on your settings you should also use appropriate cross-validation regime to access your prediction.

Andrej
  • 2,131
  • 2
  • 18
  • 26
  • I would entirely agree with Andrej. This is one of those rare cases where principal components will probably do a better job than factor analysis. – richiemorrisroe Feb 07 '11 at 13:46
  • Sorry I have one doubt, should I include in PCA the variable I want to predict? – mariana soffer Feb 08 '11 at 01:58
  • No, leave your dependent variable (Y/N variable in your case) out of PCA dimension reduction. You can read my article about same topic at the http://goo.gl/lJh5s – Andrej Feb 08 '11 at 08:57
0

You could find useful the 'logisticPCA' package.

Is an R package for dimensionality reduction of binary data.

https://cran.r-project.org/web/packages/logisticPCA/vignettes/logisticPCA.html

skan
  • 814
  • 1
  • 7
  • 20