According to the Rob Hyndman's book on forecasting, "section 5.3 - Selecting predictors", when selecting predictors in a regression model:
Adjusted $R^2$
Computer output for regression will always give the $R^2$ value, discussed in Section 5/1. However, it is not a good measure of the predictive ability of a model. Imagine a model which produces forecasts that are exactly 20% of the actual values. In that case, the $R^2$ value would be 1 (indicating perfect correlation), but the forecasts are not very close to the actual values.
In addition, $R^2$ does not allow for "degrees of freedom''. Adding any variable tends to increase the value of $R^2$ even if that variable is irrelevant. For these reasons, forecasters should not use $R^2$ to determine whether a model will give good predictions.
An equivalent idea is to select the model which gives the minimum sum of squared errors (SSE), given by
$SSE = \sum e_i^2$ , (with $e_i = y_i - \hat{y_i}$ )
Minimizing the SSE is equivalent to maximizing $R^2$ and will always choose the model with the most variables, and so is not a valid way of selecting predictors.
Now based on the wikipedia page on $R^2$:
$R^2 = 1 - \frac{SSE}{SST}$ , with $SST= \sum (y_i - \bar{y})^2$
My questions:
I can't understand, based on the the definitions of $R^2$ and $SSE$, why $R^2$ would always increase when the number of variables in the model increases? and why does the $SSE$ would also decrease? Especially if this happens as he says even when the variable is irrelevant?
Does this relation between SSE and number of variables hold only for linear models?
We use the $SSE$ for training ML models in general all the time, but from the above quoted reasons, that would be a bad idea. Is this conclusion (maximizing $R^2$/minimizing $SSE$ is not a good measure of the predictive ability of a model) specific to forecasting problems (i.e. when using regression models specifically for forecasting purposes) or does it apply to any type of regression model? And if so, why is the use of $SSE$ so relevant if it is such a bad idea?