feature importance using forward selection

Question

In the following article the author has correctly mentioned that the "petal" is more important than "sepal" in case of iris data:

https://towardsdatascience.com/feature-importance-and-forward-feature-selection-752638849962

Is the function mentioned in the article the correct way of finding important features?

It looks like ensemble models have feature selection methods built-in (e.g., LightGBM - feature_importance, xgboost - booster().get_fscore() or randomforest - feature_importance_). Is the technique explained in that article limited to certain supervised modules?

See [this](https://scikit-learn.org/stable/modules/feature_selection.html) and [specifically this](https://scikit-learn.org/stable/modules/feature_selection.html#feature-selection-using-selectfrommodel). And while you are at it, also [take a look at this](https://explained.ai/rf-importance/index.html) — Vivek Kumar, Jan 09 '19 at 13:21
Forward (backward, stepwise) selection methods have a lot of problems. So also does the concept of 'importance'. This is unlikely to work in the long run, although it may give the illusion of working in the short run. (cf, [Algorithms for automatic model selection](https://stats.stackexchange.com/a/20856/7290).) — gung - Reinstate Monica, Jan 09 '19 at 16:24

Chetan Rawal · Accepted Answer · 2020-02-18T13:12:05.233

The article reiterates the classic way to select features by choosing the ones that minimize your errors. The ensemble techniques you mention, on the other hand, yield feature importance as a natural by-product of the algorithms themselves.

It is always a good idea to combine your business knowledge of the features along with an appropriate feature selection algorithm. Using too many features with the ensembles in the hope of getting the important, uncorrelated ones as a by-product, will make your training time too lengthy, and may get you poorer results compared to training with better-chosen features (which is why you want to select features in the first place, anyway). On the other hand, feature selection by adding features one-by-one (like the article explains) also means expensive retraining of your machine learning for each iteration.

So use features that make business sense and are minimally correlated to begin with, and train a subset of your training data along with plotting learning curves to assess whether each feature improves or reduces performance. Perhaps even look at correlations and feature importance based on different techniques to see which ones overlap. At the end, data science is as much an art as it is science.

feature importance using forward selection

1 Answers1