I previously learned about sampling distributions that gave results which were for the estimator, in terms of the unknown parameter. For example, for the sampling distributions of $\hat\beta_0$ and $\hat\beta_1$ in the linear regression model $Y_i = \beta_o + \beta_1 X_i + \varepsilon_i$
$$ \hat{\beta}_0 \sim \mathcal N \left(\beta_0,~\sigma^2\left(\frac{1}{n}+\frac{\bar{x}^2}{S_{xx}}\right)\right) $$ and $$ \hat{\beta}_1 \sim \mathcal N \left(\beta_1,~\frac{\sigma^2}{S_{xx}}\right) $$
where $S_{xx} = \sum_{i=1}^n (x_i^2) -n \bar{x}^2$
But now I have seen the following in a book:
Suppose we fit the model by least squares in the usual way. Consider the Bayesian posterior distribution, and choose priors so that this is equivalent to the usual frequentist sampling distribution, that is......
$$ \left( \begin{matrix} \beta_0 \\ \beta_1 \end{matrix} \right) \sim \mathcal N_2\left[\left(\begin{matrix} \hat{\beta}_1 \\ \hat{\beta}_2 \end{matrix} \right),~\hat{\sigma}^2 \left(\begin{matrix} n & \sum_{i=1}^{n}x_i \\ \sum_{i=1}^{n}x_i & \sum_{i=1}^{n}x_i^2 \end{matrix} \right) ^{-1}\right] $$
This is confusing me because:
- Why do the estimates appear on the left hand side (lhs) of the first 2 expressions, and the right hand side (rhs) of the last expression?
- Why do the beta hats in the last expression have 1 and 2 subscripts instead of 0 and 1?
- Are these just different representations of the same thing? If they are, could someone show my how they are equivalent? If not, could someone explain the difference?
- Is it the case that the last expression is the "inversion" of the first two? Is that why the 2x2 matrix in the last expression is inverted and estimates/parameters are switched from rhs$\leftrightarrow$lhs? If so could someone show me how to get from one to the others?