I have a classifier, and I am using leave one out cross-validation to assess its performance.
On each iteration, I divide the dataset into training and testing sets. The testing set is just the subject that I am going to evaluate (leave one out).
Now, I divide the training set into folds, and I do feature selection like this:
I run my filter feature selection algorithm on every fold. When I am done, I have a voting algorithm to obtain the final set with the features that were selected in each fold.
I understand that this procedure is adequate when you have a small sample like in my case (subjects = 30, features = 960).
My question is why, if at all, would it be a bad idea to do feature selection on the whole training set instead of dividing it into folds?