A collegue of mine suggested to use PCA prior to RFE to reduce the dimensionalty (102 features versus 37 samples) and to get rid of the correlation problem - namely, if I use an RFE with a support vector regression (SVR) it can happen that the sparse solution arbitrarly chooses one feature out of two highly correlated features. Would you agree with that logic ?
The PCA returned 36 features that account for 100% of the variance. Therefore, I subjected those 36 features to an RFE with kfold = 5 and hyperparameter optimization using GridSearchCV. However, the RFE performs very bad. Scoring is the r2 score and I got an r2 score of -0.24.By looking at the folds, I see some very high r2 scores in the training set (0.7 - 0.9), but bad r2 scores in the test sets. Does that mean that my model is overfitted ? Do you see other reasons ?
many thanks for your help, mike