I see many people on the web assuming that R² is not an appropriate metric to select a regressor instead of another, suggesting AIC or BIC to do so. From my view, it means that its almost preferable to avoid complex models, even if they're more accurate (I'm not sure if this view is correct).
My question is what is wrong about using MSE? And to see if the model is better than "always predict the mean", just check R²?
Another reason that I insist on using them to prefer a regressor instead of another (even not sure of its appropriateness) is that sklearn
does use R² as default scoring function for GridSearchCV
.