0

Using the following linear regression model workflow, I was able to generate a model that was robust to LOOCV. Because of posts such as this, I know that feature selection should be done inside the LOOCV. I did feature selection outside of my LOO loop, but did it in a LOO fashion, and only used the method to remove features. I was wondering if my method suffers from the same form of bias that is highlighted in posts such as the one linked.

  1. Remove one case

  2. Perform feature selection

  3. Repeat steps 1) and 2) until all cases have been iterated through

  4. Identify features that were selected in ALL iterations of 3)

  5. Perform LOOCV using just the features identified from step 4)

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467

1 Answers1

1

Ultimately all observations have been used for feature selection, so the final LOOCV will be biased, as explained in the linked post (highest response, first sentence): "If you perform feature selection on all of the data and then cross-validate, then the test data in each fold of the cross-validation procedure was also used to choose the features and this is what biases the performance analysis."

Christian Hennig
  • 10,796
  • 8
  • 35
  • Thanks for the reply. I suspected that would be the reply. I just wasn't sure if there was something unique about using LOO to identify the features. – user3368693 Sep 26 '19 at 18:19