I have a large amount of variables (24) to predict a Y/N value, and I would like help for writting a procedure that automatically tries the different results of the factor selection to see how good the regression turns out to be, and of course I want to save the best model for later use.
Asked
Active
Viewed 3,547 times
4
-
1You may find helpful those related posts http://stats.stackexchange.com/questions/6856/aggregating-results-from-linear-model-runs-r http://stats.stackexchange.com/questions/1812/fa-choosing-rotation-matrix-based-on-simple-structure-criteria – George Dontas Feb 07 '11 at 07:21
-
Mallows Cp http://en.wikipedia.org/wiki/Mallows'_Cp Here's a paper on Cp http://mlrv.ua.edu/2008/vol34_1/Lieberman-Morris.pdf – bill_080 Feb 07 '11 at 19:23
2 Answers
4
First of all, consider if the factor analysis is the right way to do feature extraction. I would suggest to use principal component analysis to make dimension reduction first and then use extracted features as predictor variables. Depends on your settings you should also use appropriate cross-validation regime to access your prediction.

Andrej
- 2,131
- 2
- 18
- 26
-
I would entirely agree with Andrej. This is one of those rare cases where principal components will probably do a better job than factor analysis. – richiemorrisroe Feb 07 '11 at 13:46
-
Sorry I have one doubt, should I include in PCA the variable I want to predict? – mariana soffer Feb 08 '11 at 01:58
-
No, leave your dependent variable (Y/N variable in your case) out of PCA dimension reduction. You can read my article about same topic at the http://goo.gl/lJh5s – Andrej Feb 08 '11 at 08:57
0
You could find useful the 'logisticPCA' package.
Is an R package for dimensionality reduction of binary data.
https://cran.r-project.org/web/packages/logisticPCA/vignettes/logisticPCA.html

skan
- 814
- 1
- 7
- 20