2

Let $Y_1,Y_2,\ldots,Y_n$ be independently distributed random variables such that $Y_i\sim\mathcal N(\alpha+\beta x_i,\sigma^2)$ for all $i=1,\ldots,n$. If $\hat\alpha$ and $\hat\beta$ be the least square estimates of $\alpha$ and $\beta$ respectively, then what is the joint distribution of $(\hat\alpha,\hat\beta)$?

We consider the model $y=\alpha+\beta x+\epsilon$ where $y$ is stochastic and $x$ is non-stochastic.

We have the paired observations $(x_i,y_i)$ and we assume that the errors $\epsilon_i\stackrel{\text{i.i.d}}{\sim}\mathcal{N}(0,\sigma^2)$ for all $i$.

Define $s_{xx}=\sum (x_i-\bar x)^2,\,s_{yy}=\sum(y_i-\bar y)^2$ and $s_{xy}=\sum(x_i-\bar x)(y_i-\bar y)$.

From the normal equations we have $\hat\alpha=\bar y-\hat\beta\bar x$ and $\hat\beta=\dfrac{s_{xy}}{s_{xx}}$.

Transforming $\mathbf Y=(Y_1,\ldots,Y_n)\to(Z_1,\ldots,Z_n)=\mathbf Z$ such that $\mathbf Y=\mathbf{A\,Z}$ where $\mathbf A$ is an orthogonal matrix with its first two rows $\left(\frac{1}{\sqrt n},\ldots,\frac{1}{\sqrt n}\right)$ and $\left(\frac{x_1-\bar x}{\sqrt{s_{xx}}},\ldots,\frac{x_n-\bar x}{\sqrt{s_{xx}}}\right)$.

From the distribution of the $Z_i$'s, one can show that $\bar y\sim\mathcal N\left(\alpha+\beta\bar x,\frac{\sigma^2}{n}\right)$ and $\hat\beta\sim\mathcal N\left(\beta,\frac{\sigma^2}{s_{xx}}\right)$, both independently distributed of each other.

From this, one gets $\hat\alpha\sim\mathcal{N}\left(\alpha,\sigma^2\left(\frac{1}{n}+\frac{\bar x^2}{s_{xx}}\right)\right)$.

So we have the two least square estimates each having a univariate normal distribution. They are definitely not independent; I have found they have correlation $\text{Corr}(\hat\alpha,\hat\beta)=-\frac{\sqrt{n}\bar x}{\sqrt{\sum x_i^2}}$.

But how can I find the joint distribution of $(\hat\alpha,\hat\beta)$ from this? I cannot simply conclude that they are jointly normal. They are probably jointly normal but how does it follow?

The following related posts came up during a search, but don't quite get the answer I am looking for:

Edit.

The joint distribution of $(\hat\alpha,\hat\beta)=(\bar y-\hat\beta\bar x,\hat\beta)$ is bivariate normal simply because $\bar y$ and $\hat\beta$ are independent normal variables (details here). This follows from the property that different linear combinations of independent normal variables (and hence jointly normal variables) are themselves jointly normal.

StubbornAtom
  • 8,662
  • 1
  • 21
  • 67
  • In a linear regression you get the covariance matrix of all parameters, did you look at it? – Aksakal May 22 '18 at 19:20
  • [Here](https://en.wikipedia.org/wiki/Ordinary_least_squares#Assuming_normality) is the answer to your question. Do you understand it or need explanation? – Aksakal May 22 '18 at 19:24
  • 1
    I don't think that is a duplicate. This question seems to be actually asking how to derive the sampling distribution, especially, how to show that it's multivariate normal (if it is). The duplicate, despite the title, seems to be more about conceptual differences between the sampling distribution and Bayesian posterior distributions of the coefficients – Juho Kokkala May 23 '18 at 05:11
  • 1
    Since the estimates are *explicitly* a linear transformation of the responses, and it is well-known (and quoted extensively on this site) that linear transformations of multivariate Normal variables are Normal, the result follows from *any* multivariate normality assumption of the response. I'm sure this point has been made in many threads, although it would be hard to search for a specific answer given the ubiquity of references to normality and regression. – whuber Jun 13 '18 at 12:58
  • A generalization of your question is asked and answered at https://stats.stackexchange.com/questions/133312. – whuber Jun 13 '18 at 13:17
  • @StubbornAtom Can you please answer it then? Because I am stuck in the same problem, I cannot get it how to conclude that $\hat{\alpha}$ and $\hat{\beta}$ are bivariate normal. I cannot understand the matrix notations used here. – user587389 Jul 09 '20 at 14:50
  • @user587389 See edit. – StubbornAtom Jul 09 '20 at 15:17
  • @StubbornAtom "Different linear combinations of independent normal variables are themselves jointly normal"... Is there any proof or derivation? – user587389 Jul 09 '20 at 15:35
  • @user587389 Easy to verify using moment generating functions or characteristic functions. This is a standard result concerning multivariate normal distribution, so I guess no proof should be required in this particular context. – StubbornAtom Jul 09 '20 at 15:42

2 Answers2

3

In matrix notation, the least square estimates are given by \begin{equation} \hat{\theta} = \theta + (X^\prime X)^{-1} X^\prime U \end{equation} where $U \sim N_n(0, \sigma^2 \mathbf{I_n})$ with $I_n$ is an identity matrix of size $n$ (number of observations).

For the sake of argument, suppose that $X$ is known, then is follows that \begin{equation} E[\hat{\theta}] = \theta + (X^\prime X)^{-1}X^\prime E[U] = \theta \end{equation} and \begin{equation} Var[\hat{\theta}] = (X^\prime X)^{-1}X^\prime Var[U] X(X^\prime X)^{-1} = \sigma^2 (X^\prime X)^{-1} \end{equation} Since $\hat{\theta}$ is a linear function of $U$, then it follows that \begin{equation} \hat{\theta} \mid X \sim N(\theta, \sigma^2 (X^\prime X)^{-1}) \end{equation}

In the univariate case, it follows that \begin{equation} \theta=\left[\begin{array}{c} \alpha\\ \beta \end{array}\right] \end{equation} and \begin{equation} X=\left[\begin{array}{cc} 1 & x_{1}\\ \vdots & \vdots\\ 1 & x_{n} \end{array}\right] \end{equation}

If you solve $(X^\prime X)^{-1}$ then you know the covariance matrix for $\hat{\theta}$ and, hence, the covariance between $\hat{\alpha}$ and $\hat{\beta}$, which is the off-diagonal element of $\sigma^2 (X^\prime X)^{-1}$.

1

The joint distribution of model parameters in ordinary least squares (OLS) regression is described here and here. The formula is:

$$(\hat\beta - \beta)\ \xrightarrow{d}\ \mathcal{N}\big(0,\;\sigma^2Q_{xx}^{-1}\big)$$ where $$Q_{xx} = X ^T X$$, and $X$ is the design matrix.

Note, that in this case we do not assume that errors are Gaussian. In a large sample CLT kicks in, and we don't need the normality to obtain the distribution.

In your case $\hat\alpha=\hat\beta_0$ and $\hat\beta=\hat\beta_1$, and the design matrix $X$ has two columns: ones (intercept) and variable $x$, i.e. row $i$ is $X_{i0}=1,X_{i1}=x_i$

Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • I am rather looking for this bivariate normal distribution stated in this post:https://stats.stackexchange.com/questions/61235/sampling-distribution-of-regression-coefficients?noredirect=1&lq=1. Is this the same as what you have written? Don't know if this is the answer I *should* be looking for though. – StubbornAtom May 22 '18 at 19:46
  • It's the same thing – Aksakal May 22 '18 at 19:49