I have a dataset of some 100 samples, each with >10,000 features, some of which highly correlated. Here's what I am doing currently.
Split the data set into three folds.
For each fold,
2.1 Run elastic net for 100 values of lambda. (this returns a nfeatures x 100 matrix)
2.2 Take a union of all non-zero weights. (returning a nfeatures x 1 vector)Select features corresponding to the non-zero weights returned from 2.2
Use these features for training and testing SVM.
My problem is that in step 3, for each fold I get a different set of features. How do I get one final model out of this? One final list of relevant features? Can I take an intersection of the selected features in step 3 for all folds? Features that are selected in all three folds would appear to be the most stable/significant. Can I do this, or is it cheating?