What would it mean to select features in a "greedy" fashion?

Question

I'm currently taking a machine learning course at university, and came across a concept that I'm having trouble wrapping my head around and would appreciate some help.

We've recently been given an assignment to implement linear regression. We're given a set of data, which we split into training and validation sets. The main objective that I'm facing trouble with is feature selection.

More specifically, the data that we've been given has exactly 126 features. We need to build models using for loops to fit our model using different subsets of features (1 to 100 features).

For example, the first step would build 100 models using one feature, the next using pairs of features, etc.

Our instructor has told us that we should "take a greedy approach to implementing feature selection". From my understanding, it basically means we keep the features that perform best, and discard the ones that don't. Is my understanding correct?

Any feedback is appreciated. Thank you.

Note that this is, for most intents and purposes, a horrible idea. See here: https://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection?s=3|0.0000 — Frans Rodenburg, Oct 29 '18 at 13:13
Thanks for the tip @FransRodenburg. I'll keep this post in mind. — Sean, Oct 29 '18 at 13:48

score 4 · Accepted Answer · answered Oct 29 '18 at 11:44

Here’s my interpretation about greedy feature selection in your context.

First, you train models using only one feature, respectively. (So here there will be 126 models).

Second, you choose the model trained in the previous step with best performance and train new models using this feature together with another feature (so there will be 125 models).

In this way, the procedure terminates when the performance doesn’t boost by adding a new feature.

Hope it helps.

Thanks for the feedback! I also think that interpretation makes sense. — Sean, Oct 29 '18 at 12:39

score 2 · Answer 2 · answered Oct 29 '18 at 15:58

It probably helps to define what your instructor means by "greedy approach".

They are presumably talking about something like a greedy algorithm. In these algorithms, we have a choice of an action to make at any given point, and we choose the the action that gives us the largest return at any time. Without heavy assumptions (that usually don't hold!), there's no guarantee that we will find the optimal solution, but hopefully we can find a pretty good solution.

How would this work in your problem? Well, the action you can take is "add another feature". So at each step of your algorithm, you will first calculate the benefit from adding each available feature, and then add the feature with the most benefit. Repeat until no benefit is seen or maximum features reached.

What would it mean to select features in a "greedy" fashion?

2 Answers2