3

I just read this very insightful post about ridge regression, where the author stated that the variance of $\hat\beta$ is:

$$\text{var}(\hat\beta) = \sigma^2(\textbf{X}^\prime \textbf{X})^{-1}.$$

I couldn't figure out why it is like this. Can anyone elaborate a bit?

amoeba
  • 93,463
  • 28
  • 275
  • 317
user152503
  • 1,269
  • 1
  • 12
  • 18

2 Answers2

5

The covariance result you are looking at occurs under a standard regression model using ordinary least-squares (OLS) estimation. The OLS estimator (written as a random variable) is given by:

$$\begin{equation} \begin{aligned} \hat{\boldsymbol{\beta}} &= (\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} (\boldsymbol{x}^{\text{T}} \boldsymbol{Y}) \\[6pt] &= (\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \boldsymbol{x}^{\text{T}} (\boldsymbol{x} \boldsymbol{\beta} + \boldsymbol{\varepsilon}) \\[6pt] &= \boldsymbol{\beta} + (\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \boldsymbol{x}^{\text{T}} \boldsymbol{\varepsilon}. \end{aligned} \end{equation}$$

In the standard linear regression model we have $\mathbb{E}(\boldsymbol{\varepsilon}) = \boldsymbol{0}$ and $\mathbb{V}(\boldsymbol{\varepsilon}) = \sigma^2 \boldsymbol{I}$ so that the estimator is unbiased with covariance matrix given by:

$$\begin{equation} \begin{aligned} \mathbb{V}(\hat{\boldsymbol{\beta}}) &= \mathbb{V}((\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \boldsymbol{x}^{\text{T}} \boldsymbol{\varepsilon}) \\[6pt] &= ((\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \boldsymbol{x}^{\text{T}} ) \mathbb{V}(\boldsymbol{\varepsilon}) ((\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \boldsymbol{x}^{\text{T}} )^{\text{T}} \\[6pt] &= \sigma^2 ((\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \boldsymbol{x}^{\text{T}} ) \boldsymbol{I} ((\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \boldsymbol{x}^{\text{T}} )^{\text{T}} \\[6pt] &= \sigma^2 ((\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \boldsymbol{x}^{\text{T}} ) ((\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \boldsymbol{x}^{\text{T}} )^{\text{T}} \\[6pt] &= \sigma^2 (\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} (\boldsymbol{x}^{\text{T}} \boldsymbol{x}) (\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1} \\[6pt] &= \sigma^2 (\boldsymbol{x}^{\text{T}} \boldsymbol{x})^{-1}. \end{aligned} \end{equation}$$

Note that this is the conditional covariance of the estimator given the design matrix $\boldsymbol{x}$.

Ben
  • 91,027
  • 3
  • 150
  • 376
2

Four things to note:

$\hat{\beta} =(\textbf{X}^\prime\textbf{X})^{-1}\textbf{X}^\prime\textbf{Y}$

$\text{var}({\textbf{A}\textbf{Y}})=\textbf{A}\text{var}(\textbf{Y})\textbf{A}^\prime$

$\text{var}(\textbf{Y}|\textbf{X})=\sigma^2\textbf{I}$ (actually, everything is conditioned on $\textbf{X}$)

$(\textbf{X}^\prime\textbf{X})^{-1}$ is symmetric.

sjw
  • 5,091
  • 1
  • 21
  • 45