I'm currently taking a machine learning course at university, and came across a concept that I'm having trouble wrapping my head around and would appreciate some help.
We've recently been given an assignment to implement linear regression. We're given a set of data, which we split into training and validation sets. The main objective that I'm facing trouble with is feature selection.
More specifically, the data that we've been given has exactly 126 features. We need to build models using for
loops to fit our model using different subsets of features (1 to 100 features).
For example, the first step would build 100 models using one feature, the next using pairs of features, etc.
Our instructor has told us that we should "take a greedy approach to implementing feature selection". From my understanding, it basically means we keep the features that perform best, and discard the ones that don't. Is my understanding correct?
Any feedback is appreciated. Thank you.