1

Given a linear model $\mathbf{y} = \beta \mathbf{X} + \epsilon$, it is well known that the estimate for $\beta$ that gives the minimum residual sum of squares (RSS) is given by $\hat{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$.

Of course, since $\hat{\beta}$ is just an estimate, then we want to know how far it deviates from the true values $\beta$.

In the derivation I am reading (How are the standard errors of coefficients calculated in a regression?), the variance of the estimate is given by:

$$ V(\hat{\beta}) = V((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y})= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\sigma^2 \mathbf{I}\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}$$

Please help me understand what happened in this derivation.

(Or you can present a simpler derivation)

Thanks

cgo
  • 7,445
  • 10
  • 42
  • 61
  • 1
    Ben Lambert has a good video about the derivation of $V(\hat \beta)$ in matrix form for OLS [here](https://www.youtube.com/watch?v=11J0M7WBMy8). Hope this helps. – Giaco.Metrics Apr 09 '15 at 18:16

1 Answers1

3

$\hat{\beta} = (X^TX)^{-1}X^TY$ where only $Y$ is random. This means that the variance of the estimator is completely induced by the distribution of $Y$.

It can easily be shown that if $X$ is some fixed matrix or vector and $Y$ is random then $Var(XY) = XVar(Y)X^T$ (assuming that $X$ and $Y$ are compatible). Thus if we let $Z = (X^TX)^{-1}X^T$ (which is fixed) then we have that $\hat{\beta} = ZY$. We haven't changed anything but this will make the steps a little clearer.

$Var(\hat{\beta}) = ZVar(Y)Z^T$ by the identity mentioned above. Now we just need $Var(Y)$ and the rest is just plugging things in and cancelling. By assumption $Y = X\beta + \varepsilon$ where $\varepsilon \sim N(0, \sigma^2 I)$ so $Var(Y) = \sigma^2I$ (again, because $X$ is fixed).

This means that $Var(\hat{\beta}) = ZVar(Y)Z^T = Z \sigma^2I Z^t = \sigma^2 ZZ^T$. Now we can replace $Z$ with what it really is to get $Var(\hat{\beta}) = $ $$\sigma^2 (X^TX)^{-1}X^T ((X^TX)^{-1}X^T)^T = \sigma^2 (X^TX)^{-1}X^TX(X^TX)^{-1} = \sigma^2 (X^TX)^{-1}.$$

This uses the fact that $(X^TX)^{-1}$ is symmetric and $(AB)^T = B^TA^T$.

jld
  • 18,405
  • 2
  • 52
  • 65
  • In $Y=X\beta + \epsilon$, why is $Var(X\beta) = 0$? – cgo Apr 09 '15 at 18:15
  • Because $X$ and $\beta$ are constants/parameters and the variance of a non-random quantity is 0. $X$ is assumed to be fixed and $\beta$ is a parameter. The whole point is to estimate this parameter. It's just like estimating the mean of a normal sample. – jld Apr 09 '15 at 18:17
  • You're very welcome. – jld Apr 09 '15 at 18:18