2

I am reading Mathematical Statistics with Applications by Wackerly et al. (7th edition). In Chapter 11, the book discusses linear models and least squares, specifically $Y = \beta_0+\beta_1x+\epsilon$ where $E(\epsilon)=0$ and $Var(\epsilon)=\sigma^2$, with $E(Y)=\beta_0+\beta_1x$ being deterministic.

The book derives $\hat\beta_0=\bar Y - \hat\beta_1\bar x$ and $\hat\beta_1=\frac{\sum(x_i-\bar x)(Y_i-\bar Y)}{\sum(x_i - \bar x)^2}$ as estimators of $\beta_0$ and $\beta_1$ respectively from minimizing the SSE. The book states if $\epsilon$ is normally distributed, then each $Y_i$ from an independent sample is normally distributed and $\hat\beta_0$ and $\hat\beta_1$ are also normally distributed since they are linear combinations of $Y_i$. I am assuming this is true based on this proof?

However, the book later goes further to state that $\hat\beta_0$ and $\hat\beta_1$ being normal implies that a linear combination $\hat\theta = a_0\hat\beta_0 + a_1\hat\beta_1$ is also normal. This confuses me since $Cov(\hat\beta_0 ,\hat\beta_1)=\frac{-\bar x\sigma^2}{\sum(x_i-\bar x)^2}$ may not be zero, which means for $\hat\theta$ to be normal, the joint distribution of $\hat\beta_0$ and $\hat\beta_1$ must be normal (per this proof) but the book makes no reference to the joint distribution. Is it always true that $\hat\theta$ is normally distributed? If so, how is it shown?

Yandle
  • 743
  • 2
  • 12
  • 1
    It's all the same proof and it's simple: $(\hat\beta_0,\hat\beta_1)$ is a linear transformation of the jointly normal vector $\epsilon$ and therefore is jointly normal, *QED.* – whuber Jan 08 '20 at 05:34
  • 1
    You can see all the linked threads here: https://stats.stackexchange.com/questions/347628/joint-distribution-of-least-square-estimates-hat-alpha-hat-beta-in-a-simpl. – StubbornAtom Jan 08 '20 at 07:03

1 Answers1

4

The answer is yes. This is because least squares estimates themselves are jointly normal distributed which can be seen by applying the linear transformation theorem for the multivariate normal distribution (Proof):

$$ x \sim \mathcal{N}(\mu,\Sigma) \quad \Rightarrow \quad y = Ax + b \sim \mathcal{N}(A\mu + b, A \Sigma A^\mathrm{T}) $$

Using matrix notation for linear regression, the distribution of the data is:

$$ y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 I_n) $$

$$ \Rightarrow \quad y \sim \mathcal{N}(X\beta, \sigma^2 I_n) $$

If we apply ordinary least squares (Proof), the distribution of the estimates is:

$$ \hat{\beta} = (X^\mathrm{T}X)^{-1} X^\mathrm{T}y $$

$$ \Rightarrow \hat{\beta} \sim \mathcal{N}\left((X^\mathrm{T}X)^{-1} X^\mathrm{T}X\beta, \sigma^2 (X^\mathrm{T}X)^{-1} X^\mathrm{T} X (X^\mathrm{T}X)^{-1} \right) = \mathcal{N}\left( \beta, \sigma^2 (X^\mathrm{T}X)^{-1} \right) $$

Then, a linear combination of the least squares estimates is distributed as:

$$ \Rightarrow c^\mathrm{T} \hat{\beta} \sim \mathcal{N}\left( c^\mathrm{T} \beta, \sigma^2 c^\mathrm{T} (X^\mathrm{T}X)^{-1} c \right) $$

In your special case (a.k.a. "simple linear regression"), the quantities would be:

$$ X = \begin{bmatrix} 1_n & x \end{bmatrix}, \; \beta = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}, \; c = \begin{bmatrix} a_0 \\ a_1 \end{bmatrix}, \; c^\mathrm{T} \hat{\beta} = \hat{\theta} $$

$$ ( \text{and of course} \quad y = Y, \; \varepsilon = \epsilon ) $$

Joram Soch
  • 116
  • 6
  • Just discovered that my answer is roughly equivalent to this one: https://stats.stackexchange.com/a/347642/270304; and this one: https://stats.stackexchange.com/a/133333/270304. – Joram Soch Jan 08 '20 at 07:25
  • For the case where $X^TX$ is not invertible, does this still hold? I considered substituting the general pseudoinverse of $X$ to replace occurrences of $(X^TX)^{-1}X^T$ but I'm actually not sure whether the pseudoinverse in this context is always non-singular. Per [Lemma 5](https://mast.queensu.ca/~stat353/slides/5-multivariate_normal17_4.pdf) it seems the joint distribution of $\hat\beta$ doesn't exist when the covariance matrix is singular. – Yandle Jan 09 '20 at 05:58
  • Yes. Ordinary least squares requires that the design matrix has full rank, see here: https://stats.stackexchange.com/a/138342/270304. – Joram Soch Jan 09 '20 at 14:20