The approach you have proposed is one of the common feature selection strategies called Forward step-wise selection. You can see the link for a summary of it and other techniques. I strongly suggest you to check other techniques like PCA, Lasso etc. In any case, I will summarize Forward step-wise selection:
In the first step of analyses, we gather all possible features (including their exponents such as $x_1^2$ and interactions $x_1x_2$) which we believe would be useful in regression. Then, we choose a small subset among this possible feature set. A brute for approch would be training our model with all possible feature subsets. However, as you can guess, this is computationally very expensive. Instead, we prefer to find a good path through them. Forward-stepwise selection technique achieves this iteratively. First, you start with a single variable model with only one feature and an intercept. For this variable, you need to try out all features (train your model) you have gathered. Then, among them, you choose the 'best-performing' one. Here, the performance metric might be $R^2$, $MSE$ or anything else (including $AIC, BIC$). However, please do not forget to do this on an isolated data set (test set) which is not used in training phase.
Once you choose this 'best' feature, now you extend your model to two variables and test all remaining features in your feature set for this new feature slot. You can iterate this operation until you come up to a decent model which performs well enough and is not improved with further feature additions.