1

I'm learning about regression models via Andrew Ng's Coursera course. I have a question regarding automatically finding a good model.

Does it make sense (my guess is no) to iteratively add terms, or iteratively change a model to a different one, and then check the $R^2$ or MSE and see which one is best?

For example, if you have model like y = ax + b, and then you change it to $y = ax^2 + b$ and then $y = ax^2 + ax + b$, and keep doing so while continuously checking if it's better or worse than the other models.

Like I said, I'm just beginning learning this stuff, so I'm sure there's a better way to do this, but this just popped into my head as a possibility.

Ferdi
  • 4,882
  • 7
  • 42
  • 62
  • Researching this a little further, I came across this thread https://math.stackexchange.com/questions/3069369/how-to-choose-degree-for-polynomial-regression which is basically asking the same thing but maybe a little clearer. In it AIC, BIC, and Chebychev polynomials are mentioned. So I may have my answer. If anyone can expand on these here that would be appreciated though. – isuckatprogramming May 02 '19 at 16:01
  • Another method I saw was choosing an arbitrarily large degree polynomial and using L2 regularization. This seems like it would introduce problems down the road though. – isuckatprogramming May 02 '19 at 16:11

1 Answers1

1

The approach you have proposed is one of the common feature selection strategies called Forward step-wise selection. You can see the link for a summary of it and other techniques. I strongly suggest you to check other techniques like PCA, Lasso etc. In any case, I will summarize Forward step-wise selection:

In the first step of analyses, we gather all possible features (including their exponents such as $x_1^2$ and interactions $x_1x_2$) which we believe would be useful in regression. Then, we choose a small subset among this possible feature set. A brute for approch would be training our model with all possible feature subsets. However, as you can guess, this is computationally very expensive. Instead, we prefer to find a good path through them. Forward-stepwise selection technique achieves this iteratively. First, you start with a single variable model with only one feature and an intercept. For this variable, you need to try out all features (train your model) you have gathered. Then, among them, you choose the 'best-performing' one. Here, the performance metric might be $R^2$, $MSE$ or anything else (including $AIC, BIC$). However, please do not forget to do this on an isolated data set (test set) which is not used in training phase.

Once you choose this 'best' feature, now you extend your model to two variables and test all remaining features in your feature set for this new feature slot. You can iterate this operation until you come up to a decent model which performs well enough and is not improved with further feature additions.

Monotros
  • 742
  • 4
  • 10