2

Suppose I do linear regression for data $y \in \mathbb{R}^n$ and design matrix $X \in \mathbb{R}^{n \times m}$, with $n \gg m$. I seek $$ \hat{\beta} = \operatorname*{argmin}_{\beta \in \mathbb{R}^m} \| X\beta - y \|_2. $$

What are the ways to quantify uncertainty in $\hat{\beta}$? I considered bootstrap and maybe a Bayesian estimator that may give a prior with closed form expression for the variance of $\beta$. What are other approaches?

References are appreciated but a full derivation (with intuition) would be ideal.

Full derivation: Proof that the coefficients in an OLS model follow a t-distribution with (n-k) degrees of freedom

Yair Daon
  • 2,336
  • 16
  • 29
  • @moreblue I would really appreciate it if you could elaborate... – Yair Daon Apr 27 '19 at 23:44
  • 2
    Say the model is $y = X\beta + \epsilon$, and you assume $var(\epsilon) =\sigma^2 I_n$. Then the estimated variance of $\beta^\ast$ is $\hat{\sigma}^2(X^TX)^{-1}$ where $\hat{\sigma}^2$ is the [RSS](https://en.wikipedia.org/wiki/Residual_sum_of_squares) devided by its degree of freedom (which is $n$ minus the number of coefficients) (Edited, because you do not need any distribution assumptions for $\epsilon$) – moreblue Apr 27 '19 at 23:47
  • 1
    * where $\mathbb{E}(\epsilon)=0$... – moreblue Apr 27 '19 at 23:55
  • 1
    There are standard ways of finding a confidence region for $\widehat\beta$ (which you called $\beta^\star$) based on the facts that $(1)$ $\quad \widehat\beta \sim N_m(\beta, \sigma^2(X^\top X)^{-1})$ and $(2)$ $ \quad \| \widehat {\varepsilon\,} \|^2 / \sigma^2 \sim \chi^2_{n-m}$ and $(3)$ $\quad \widehat \beta$ and $\widehat {\varepsilon\,}$ are independent of each other. $\qquad$ – Michael Hardy Apr 28 '19 at 02:34
  • what happen if $X=X'+\epsilon'$ with $Var(\epsilon')=\sigma'^2$ ? – Boris Valderrama Mar 13 '20 at 01:49

1 Answers1

1

Assume $y = X\beta + \epsilon$ with $\mathbb{E}[\epsilon] = 0, Cov(\epsilon) = I\sigma^2$. We know from the Normal equations that $\hat{\beta} = (X^tX)^{-1}X^ty$. We can verify $\mathbb{E}[\hat{\beta}] = \beta$. Using the assumptions on $\epsilon$, we get

$$ \mathbb{E}[\hat{\beta}\hat{\beta}^t] = (X^tX)^{-1}X^t\mathbb{E}[(X\beta +\epsilon)(X\beta + \epsilon)^t] X(X^tX)^{-1} \\ = \beta\beta^t + (X^tX)^{-1}\sigma^2 $$

Hence $$ Cov(\hat{\beta}) = \mathbb{E}[\hat{\beta}\hat{\beta}^t] - \mathbb{E}[\hat{\beta}] \mathbb{E}[\hat{\beta}^t] = (X^tX)^{-1}\sigma^2. $$

I hope to extend this to the case where $\hat{\sigma}$ is estimated from data some other time...

Yair Daon
  • 2,336
  • 16
  • 29