How to select the final model with elastic net feature selection, cross validation and SVM?

Question

I have a dataset of some 100 samples, each with >10,000 features, some of which highly correlated. Here's what I am doing currently.

Split the data set into three folds.
For each fold,
2.1 Run elastic net for 100 values of lambda. (this returns a nfeatures x 100 matrix)
2.2 Take a union of all non-zero weights. (returning a nfeatures x 1 vector)
Select features corresponding to the non-zero weights returned from 2.2
Use these features for training and testing SVM.

My problem is that in step 3, for each fold I get a different set of features. How do I get one final model out of this? One final list of relevant features? Can I take an intersection of the selected features in step 3 for all folds? Features that are selected in all three folds would appear to be the most stable/significant. Can I do this, or is it cheating?

score 6 · Answer 1 · answered Mar 02 '12 at 16:45

By "for each fold I get a different set of features", I suspect you mean that you are using a k-fold cross-validation procedure to estimte the performance of the model. The thing to remember about cross-validation is that you are estimating the perfomance of a method of constructing a model, not the model itself. So you form the final model, just use the procedure used in each fold of the cross-validation, but using all of the data, rather than (k-1)/k of it.

I am not sure there is much to be gained from using an elastic net to choose the features for an SVM. The SVM is an approximate implementation of a bound on the generalisation performance, which is independent of the dimensionality of the input space, so with a good choice of C, it should work just fine in a 10,000 dimensional feature space (this is what I have found via practical experience as well).

As a sort of belt-and-braces approach, you could use bootstrapped SVMs, and use the out-of-bag error to estimate performance. If you have a linear SVM, then you can combine all of the bootstrapped SVMs into a single linear model after training, so there is no performance problem in operation. Likewise an average of the elastic net models will probably work pretty well also.

How to select the final model with elastic net feature selection, cross validation and SVM?

1 Answers1

Linked