Why does including an offset in ordinary regression change $R^2$?

Question

If I change the null hypothesis according to: suggestion in answer, I receive different R$^2$. Why is the adjusted R$^2$ different in these two cases:

lm(y ~ x, offset= 1.00*x)

and

lm(y-x ~ x)

How is it possible, and which one is correct?

This is not a coding question. You should have asked for clarification on CrossValidated.com — DWin, Jul 20 '17 at 16:11
Are you assuming a model with slope equal to 1 or using the results from the original model to test the slope being different from 1. In the first case R$^2$ would change but not in the second/ — Michael R. Chernick, Jul 20 '17 at 21:24

whuber · Accepted Answer · 2017-07-24T20:24:51.710

Both are valid summaries of the models, but they should differ because the models involve different responses.

The following analysis focuses on $R^2$, because those differ, too, and the adjusted $R^2$ is a simple function of $R^2$ (but a little more complicated to write).

The first model is

$$\mathbb{E}(Y \mid x) = \alpha_0 + \alpha_1 x + x\tag{1}$$

where $\alpha_0$ and $\alpha_1$ are parameters to be estimated. The last term $x$ is the "offset": this merely means that term is automatically included and its coefficient (namely, $1$) will not be varied.

The second model is

$$\mathbb{E}(Y-x\mid x) = \beta_0 + \beta_1 x\tag{2}$$

where $\beta_0$ and $\beta_1$ are parameters to be estimated. Linearity of expectation and the "taking out what is known" property of conditional expectations allows us to rewrite the left hand side as a difference $\mathbb{E}(Y\mid x) - x$ and algebra lets us add $x$ to both sides to produce

$$\mathbb{E}(Y\mid x) = \beta_0 + \beta_1 x + x.$$

Thus the models are the same and are even parameterized identically, with $\alpha_i$ corresponding to $\beta_i$. As the output will attest, everything about their fits is the same: the coefficient estimates, their standard errors, the F statistic, and the p-values. However, the predictions differ: model $(1)$ predicts $$\mathbb{E}(Y\mid x)$$ while model $(2)$ predicts $$\mathbb{E}(Y-x\mid x).$$ Therefore, in computing $R^2$--the "amount of variance explained," the "amount of variance" refers to different quantities: $\operatorname{Var}(Y)$ in the first case and $$\operatorname{Var}(Y-x) = \operatorname{Var}(Y) + \operatorname{Var}(x) - 2\operatorname{Cov}(Y,x)$$ in the second. Moreover, the predictions of the two models differ, too: in the first model the predicted value of $\mathbb{E}(Y)$ for any $x$ is $$\hat y_1(x) = \hat\alpha_0 + (1 + \hat \alpha_1)x$$ (using, as is common, hats to designate estimated values of parameters) while in the second model the predicted value of $\mathbb{E}(Y-x)$ is $$\hat y_2(x) = \hat\beta_0 + \hat\beta_1 x = \hat\alpha_0 + \hat \alpha_1 x= \hat y_1(x) - x$$ (since the parameterizations correspond and the fits are the same).

We can try to relate the two $R^2$. Let the data be $(x_1,y_1),\ldots, (x_n,y_n)$. For brevity, adopt vector notation $\mathbf{x} = (x_1,\ldots, x_n)$ and $\mathbf{y} = (y_1,\ldots, y_n)$. To distinguish variances and covariances of random variables in the models from properties of the data, for any $n$-vectors $\mathbf a$ and $\mathbf b$ write

$$\bar{\mathbf{a}}= \frac{1}{n}\left(a_1 + \cdots + a_n\right)$$ and

$$V(\mathbf{a}, \mathbf{b}) = \frac{1}{n-1}\left((a_1-\bar{\mathbf{a}})(b_1-\bar{\mathbf{b}}) + \cdots + (a_n-\bar{\mathbf{a}})(b_n-\bar{\mathbf{b}})\right).$$

Let $V(\mathbf{a}) = V(\mathbf{a},\mathbf{a})$ be a convenient shorthand.

In model $(1)$, the coefficient of determination is

$$R^2_1 = \frac{V(\hat{y}_1(\mathbf{x}))}{V(\mathbf{y})}$$

while in model $(2)$ it is

$$\eqalign{R^2_2 &= \frac{V(\hat{y}_2(\mathbf{x}))}{V(\mathbf{y} - \mathbf{x})}\\ &=\frac{V(\hat{y}_1(\mathbf{x}) - \mathbf{x})}{V(\mathbf{y} - \mathbf{x})}\\ &=\frac{V(\hat{y}_1(\mathbf{x})) + V(\mathbf{x}) - 2V(\hat{y}_1(\mathbf{x}), \mathbf{x})}{V(\mathbf{y}) + V(\mathbf{x}) - 2V(\mathbf{y}, \mathbf{x})}. }$$

We can see $R^2_1$ lurking in the numerator in the form $V(\hat{y}_1(\mathbf{x})) = V(\mathbf{y}) R^2_1$, but no general simplification is evident. Indeed, we cannot even say in general which $R^2$ is greater than the other, even though the models give identical predictions of $\mathbb{E}(Y)$.

These considerations suggest that $R^2$ might be overinterpreted in many situations. In particular, as a measure of "goodness of fit" it leaves much to be desired. Although it has its uses--it is a basic ingredient in many informative regression statistics--its meaning and interpretation might not be as straightforward as they would seem.

Why does including an offset in ordinary regression change $R^2$?

1 Answers1