I'm planning on doing feature-selection with RandomForestClassifier
by using the feature_importances
and oob_score
. My plan is to recursively drop the 20% least important features and measure the OOB error until I get a significant drop, as recommended here.
BUT, I'm baffled by a comment I saw in a scikit example for tracking oob errors. It says: Setting the warm_start construction parameter to True is necessary for tracking the OOB error trajectory during training.
Why do they mean by that?? I was planning on using clone
for each iteration with a new subset of features, and comparing the oob_score
for each classifier. Am I missing something, or is it just a performance recommendation available in that particular example?