0

As stated above, How does one show mathematically that adding many variables in a linear regression will only increase R-squared ?

user1769197
  • 849
  • 2
  • 8
  • 20

1 Answers1

3

This is a general property of nested models: When you add parameters to a model (retaining all previous parameters) the fit cannot get worse because merely setting the new parameters to zero makes the expanded model identical to the nested (restricted) model. Notice that the additional parameter might not make the fit improve, but usually will because by chance the additional complexity of the expanded model will at least better fit random noise in the data.

In the specific case of linear regression, the model $\hat{y} = \beta_0 +\beta_1 x_1 + \beta_2 x_2$ must fit the data at least as well as the restricted model $\hat{y} = \beta_0 +\beta_1 x_1 + 0 x_2$ where $\beta_2 = 0$. Or, argue it this way: Let $\beta_0^*$, $\beta_1^*$, and $\beta_2^*$ be the least-squares estimates. Then any other value of $\beta_2$ must make $R^2$ worse; in particular $\beta_2=0$ must make $R^2$ worse (unless $\beta_2^*$ happens already to be exactly $0$).

John K. Kruschke
  • 2,153
  • 12
  • 16