2

Been doing reading on feature selection and hyperparameter tuning but I'm getting lost on how to properly code/set up the experiment. I am doing a classified ML experiment, I have 1200 samples and 400 features, and would like to optimize my models. My plan is to do a stratified k-fold analysis, use RFE for feature selection and do hyperparameter tuning for models where applicable. My understanding is that both the feature selection + hyperparameter tuning should occur at each fold of the looping process? I was wondering how that would be done in python. My instinct is that I have to use some combination of RFE (or RFECV) and GridSearchCV?

Does this thought process make sense?

  1. Split the data into training/test set, discard the test set for now.
  2. Using the training set, use GridSearchCV to do the cross-validation w/ stratified K-fold, and embed RFE within the loop
  3. Select the best model
  4. Fit to Test set

OR

  1. Split data in training/test
  2. K-Fold RFE selection for a given model
  3. Select those features identified by RFE
  4. Then perform hyperparameter tuning on those features

Does this make sense? Could someone provide an example code so I can see it laid out?

Thanks!

wsheikh92
  • 21
  • 2
  • 2
    Does this answer your question? [How should Feature Selection and Hyperparameter optimization be ordered in the machine learning pipeline?](https://stats.stackexchange.com/questions/264533/how-should-feature-selection-and-hyperparameter-optimization-be-ordered-in-the-m) – skeller88 Apr 16 '20 at 18:07

0 Answers0