2

I am running into something I have not experienced and am a little confused. I have a set of about 60 predictor variables that I have manually picked from a large set. I have been running algorithms such as random forest, logistic regression, gradient boosting, and neural networks.

With the set of 60 variables I was achieving AUC values of around 70% for all algorithms. I then ran a Bourta algorithm to find important variables. After removing the variables of unimportance, I now have 26. I reran the same ML algorithms and am now receiving AUC values of 1 and .995 for logistic, gradient boosting, and random forest which I figure is not correct. However, neural networks AUC went down to 50%.

I may be missing something obvious. Has anyone experienced this before or can explain what is happening?

  • 2
    You are engaging in a massive fishing expedition. There is nothing reliable in your result. Much has been written about the problems on this site. Note that two-stage approaches, where the 2nd stage does not inherit penalization from a first state, is particularly disastrous. – Frank Harrell May 03 '18 at 11:31
  • 1
    Possible duplicate of [Feature selection for "final" model when performing cross-validation in machine learning](https://stats.stackexchange.com/questions/2306/feature-selection-for-final-model-when-performing-cross-validation-in-machine) – Sycorax Jul 30 '18 at 02:38

0 Answers0