Should I select features before using decision tree?

Question

Since decision tree don't use all the input features and select them in the process, is it useful to do feature selection before?

As I see it, choosing features will decrease computing time (and decrease overfitting risk on small dataset?), but as multiple weak features can perform better than strong ones, I may also have a worse prediction.

EDIT : Bonus question : Is there a way to select features before a decision tree, or should I let it do the work ?

The question is how can you select them before the decision tree? — Metariat, Sep 19 '16 at 07:36
I was thinking using a feature selection technique, maybe LASSO or something else — CoMartel, Sep 19 '16 at 07:46
Variables that are important in LASSO don't necessarily have the same relationship with the outcome as in decision tree. You can see a related question here: http://stats.stackexchange.com/questions/164048/can-random-forest-be-used-for-feature-selection-in-multiple-linear-regression — Metariat, Sep 19 '16 at 07:53
Ok, I see your point. The remaining question is : is there a way to select features before a decision tree, or should I let it do the work ? — CoMartel, Sep 19 '16 at 08:11
In my personal experienc, I don't see any way to select features before building the tree. — Metariat, Sep 19 '16 at 08:12
If your goal is to reduce features to make a smaller or more general tree, you may want to use dimensionality reduction like PCA rather than feature selection like LASSO. Unlike feature selection, which just gets a subset of features, dimensionality reduction may merge features. Another option is to just prune your tree (allow only n levels). Yet another option is to get the "importance" (predictive power) of each feature by predetermining it's information gain, the most common loss function in a tree, and removing the unimportant ones. — Victor Stoddard, Jul 04 '17 at 22:24

score 2 · Accepted Answer · answered Sep 19 '16 at 08:31

Decision Trees are pretty good at finding the most important features, they consider all features and create a split on the one that is separating class labels the best (in terms of entropy).

If you use Random Forests it's even better, because some implementations (like scikit-learn's) are capable of sampling the features and use only a subset of it. Also in general Random Forests are more robust than decision trees.

If you want, you can compute Information Gain before using a Decision Tree to see how much information a particular feature contains regarding the Label:

https://en.wikipedia.org/wiki/Information_gain_in_decision_trees

Should I select features before using decision tree?

1 Answers1