I had to choose the best set of features from 200 of them.
Currently the approach I am using is to:
- Loop through the features
- In each loop, add a feature, check the loss of the model, store this loss value somewhere and remove this feature
- This goes on for each of the 200 features.
- Now I know the feature for which the loss was minimum
- This feature is added to the final set of features - as a result i now have 1 feature added to my final set of features
This entire operation of features being gradually added to the final set is repeated till convergence.
Question:
Consider 3 features: A, B and C. When I have only A as a feature the model, I get a loss of, say 10. With only B, a loss of 20 and with C, a loss of 20 as well. Is it possible that using only a combination of features of B and C gives me a better model than one that includes A (and B and/or C if desired)?
Is there any flaw with my method?