Feature Selection in Linear Regression

Question

I had to choose the best set of features from 200 of them.

Currently the approach I am using is to:

Loop through the features
In each loop, add a feature, check the loss of the model, store this loss value somewhere and remove this feature
This goes on for each of the 200 features.
Now I know the feature for which the loss was minimum
This feature is added to the final set of features - as a result i now have 1 feature added to my final set of features

This entire operation of features being gradually added to the final set is repeated till convergence.

Question:

Consider 3 features: A, B and C. When I have only A as a feature the model, I get a loss of, say 10. With only B, a loss of 20 and with C, a loss of 20 as well. Is it possible that using only a combination of features of B and C gives me a better model than one that includes A (and B and/or C if desired)?

Is there any flaw with my method?

This sounds like a (limited) form of *stepwise regression.* It would be worth your time to investigate this technique, because it has many well-known flaws. — whuber, Nov 02 '18 at 15:25
@whuber I see. In that case how would you recommend going about selecting features (or even selecting a model to select features)? I have about 15000 data points with multinomial as well as categorical features — rahs, Nov 02 '18 at 16:56
What's wrong with N=15k & p=200? Why do you need feature selection in the first place? What is your actual goal for your model once you have it? FWIW, the general recommendation is to select features based on theoretical understanding / background knowledge of the topic *before you ever see the data*. — gung - Reinstate Monica, Nov 02 '18 at 17:09
@gung I have an idea of about 10 features that should definitely affect the data, but there are about 100 features which may or may not affect it - that can only be found heuristically (which is why I was gradually adding them to the model). What do you think of choosing features directly based off of the correlation matrix, i.e., choosing, say, all features that have a correlation of >0.1 with the dependent variable? — rahs, Nov 02 '18 at 17:13
I think it's incredibly unlikely you'd be able to have meaningful tests of 100 different variables on the same dataset. That's not how science works. What is your actual goal for your model once you have it? Why do you need feature selection at all? It may help you to read my answer here: [Algorithms for automatic model selection](https://stats.stackexchange.com/a/20856/7290). — gung - Reinstate Monica, Nov 02 '18 at 17:40

Feature Selection in Linear Regression

0 Answers0