2

Problem
Given a linear model $y_i = \beta_1 + \beta_2 x_i +\epsilon_i, \quad i = 1, \dots, n$
I need to compare the variance ordinary least squares estimator of $\beta_2$ without the restrictions and the variance of ordinary least squares estimator of $\beta_2$ under linear restriction $\beta_1= 0$ (i.e. $\Bbb Var(\beta_2^R)$ and $\Bbb Var(\beta_2^U) $).
Is $\beta_2^R$ unbiased and are there any violations of Gauss–Markov theorem?

My ideas
Intuitively, the restricted variance should be less than the unrestricted variance because the restrictions always damage the flexibility of a model hence reducing the variance.
However, I need a mathematical proof for the problem above.
UPD
With help from comments I have obtained that $$ Var(\beta_2^{UR}) = \frac{\sum(y_i - \beta_1- \beta_2x_i)^2/(n-2)}{\sum(x_i - \bar{x})^2}$$ and $$ Var(\beta_2^{R}) = \frac{\sum(y_i - \beta_2x_i)^2/(n-2)}{\sum(x_i )^2}$$

Bruh
  • 27
  • 7
  • Start with the formulas for the variances (they are simple). Where does that take you? – whuber Dec 03 '20 at 12:58
  • If I'm not mistaken $ \Bbb Var(\hat{\beta_2}) = \frac{s_u^2}{\sum(x_i - \bar{x})^2} $ where $s_u^2 = \frac{\sum e^2}{n-2}$. Do you mean these formulas? – Bruh Dec 03 '20 at 14:03
  • That's one of them. You also need the formula for the second model where $\beta_1$ does not appear. – whuber Dec 03 '20 at 14:21
  • The problem is that I could not find the formula. According to the formula above, the variance of slope does not depend on the intercept absence... But it must depend on it. – Bruh Dec 03 '20 at 14:48
  • 1
    https://stats.stackexchange.com/search?q=regression+variance+formula – whuber Dec 03 '20 at 15:59
  • Actually, I have tried it. I indeed found it but it the matrix form: https://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression – Bruh Dec 03 '20 at 16:06
  • Since the problem with $\beta_1=0$ has a single parameter, all square matrices in the matrix form are just *numbers.* – whuber Dec 03 '20 at 16:07
  • 1
    Just to make it clear - an transposed number is just number, isn't it? If yes, then $\Bbb Var = \frac{\hat{\sigma}^2}{X^2}$? – Bruh Dec 03 '20 at 16:09
  • @whuber thanks! I realized that the restricted variance is $\frac{s^2_u}{\sum x_i^2}$. However it's still not obvious for me why $\frac{s^2_u}{\sum x_i^2}$ is less than $\frac{s^2_u}{\sum (x_i - \bar{x})^2}$ – Bruh Dec 04 '20 at 07:46
  • 1
    Beware your notation: you use "$s_u$" for two different quantities! You need to express each of the (different) $s_u^2$ in terms of the data. – whuber Dec 04 '20 at 14:09
  • Isn't $\hat{\sigma}^2$ just the variance of the error term i.e $s_u^2$? – Bruh Dec 05 '20 at 14:47
  • 1
    Two different models, two *different* error terms. – whuber Dec 05 '20 at 15:09
  • Yeah, I understood what you ment. Still it have not become easier to compare $\frac{\sum(y_i - \beta_1- \beta_2x_i)^2/(n-2)}{\sum(x_i - \bar{x})^2}$ and $\frac{\sum(y_i - \beta_2x_i)^2/(n-2)}{\sum(x_i )^2}$ – Bruh Dec 06 '20 at 05:39
  • Re the edit: your formula for the variance in the restricted case is incorrect (and this error is fundamental to answering your question): you must divide the sum of squares of residuals by $n-1$ rather than $n-2.$ This makes it possible, in special cases, for the variance of the restricted coefficient to be *less* than the variance of the unrestricted coefficient. You can see for yourself by running some random examples. Here is `R` code to compare the variances: `x – whuber Dec 07 '20 at 16:43

1 Answers1

1

You can find the general formula for the variance of the regression coefficients on many questions on this site (e.g., here. To facilitate our analysis, let $\sigma^2 = \mathbb{V}(\epsilon_i)$ denote the error variance. In a model with an intercept term (i.e., allowing free $\beta_1$) you have the variance:

$$\mathbb{V}(\hat{\beta}_2^U) = \frac{\sigma^2}{\sum x_i^2 - n \bar{x}^2}.$$

In a model without an intercept term (i.e., setting $\beta_1=0$) you have the variance:

$$\mathbb{V}(\hat{\beta}_2^R) = \frac{\sigma^2}{\sum x_i^2}.$$

Both of these results can be derived from the general form $\mathbb{V}(\boldsymbol{\hat{\beta}}) = \sigma^2 (\mathbb{x}^\text{T} \mathbb{x})^{-1}$ using the relevant design matrix $\mathbf{x}$ for the models with/without the intercept term (i.e., with/without a column of ones).

As to your latter question of whether $\hat{\beta}_2^R$ is biased, have a look at the theory of omitted variable bias. If the true intercept of the model is zero then, intuitively, assuming it is zero should not bias the estimator, and should improve our estimation. Contraily, if the true intercept is not zero then we would expect that assuming it to be zero might cause some problems. The formula for omitted variable bias should allow you to write the bias of your estimator as a function of the (unknown) true intercept term.


Some final notes on your working: It is worth pointing out that you are using non-standard notation for the intercept and slope terms in the model --- usually we would denote these as $\beta_0$ and $\beta_1$ respectively. Another thing to note is that the variance equations you have written cannot possibly be correct, firstly because they include the random variable of interest in them, and secondly because they do not include any reference to the variability of the error term in the model.

Ben
  • 91,027
  • 3
  • 150
  • 376
  • Just to make it clear, $\hat{\sigma}^2$ for restricted and unrestricted models are the same, right? – Bruh Dec 08 '20 at 09:01
  • The equations in my answer use the true variance $\sigma^2$, but the estimator depends on the residual mean-square, which is different in the two models. – Ben Dec 08 '20 at 10:10
  • The quantity $\sigma$ when the "OLS estimator under the restriction $\beta_0=0$" has a very different meaning than the same $\sigma$ under the full model! – whuber Dec 08 '20 at 16:45
  • 1
    Not really --- in both cases it is the standard deviation of the error term. Certainly the model form is different, so its interpretation is different, but it is still the error standard deviation. – Ben Dec 08 '20 at 20:32
  • @Ben , it seems I managed to get the formula for omitted variable bias in this task: $$ E(\hat{\beta_2}^R) = \beta_2 + \beta_1 \cdot \frac{\sum x_i}{\sum x_i^2}$$ So, it will be unbiased if true $\beta_1$ is zero or when $\sum x_i = 0$. Am I right? – Bruh Dec 13 '20 at 08:09