Least squares regression when data has error bars

Question

Suppose I have some data $(x_i,y_i)$. If we perform ordinary least squares, we can get standard errors of the slope and intercept using estimates like $\hat{Var}(\hat{\beta}) = \hat{\sigma}^2 (X^\top X)^{-1}$ (see here and here).

However, suppose each $y_i$ came with a standard error $s_i$, possibly from some previous procedure. How would we obtain standard errors of the ordinary least squares slope and intercept that account for the standard errors from $s_i$? Or should we use a different regression technique?

Do you mean, you have some regression model $y_i = \mathbf{x}_i' \mathbf{\beta} + \epsilon_i$ and for whatever reason, you know that the variance of $\epsilon_i$ is $\sigma^2_i$, that is, (i) the variance of the residual is different for different observations (ii) you know what the variance for each observation is? Is this correct? — Matthew Gunn, Oct 08 '16 at 04:19
@MatthewGunn Yes, I think that is what I mean, up to the fact that $s_i^2$ may only be an estimate of the true $\sigma_i^2$. But we can temporarily consider the simpler case where $\sigma_i^2$ is known for now. — angryavian, Oct 08 '16 at 04:32

Matthew Gunn · Accepted Answer · 2016-10-08T05:09:50.100

Assume regression model: $$ y_i = \mathbf{x}'_i \boldsymbol{\beta} + \epsilon_i$$

Let $\boldsymbol{\epsilon}$ be your vector of error terms.

If you know that $ \mathrm{Var}\left(\boldsymbol{\epsilon} \right) = \Omega$, for example:

$$ \Omega = \begin{bmatrix} \sigma^2_1 & 0 & 0& \ldots&0\\0 &\sigma^2_2 & 0 & \ldots&0\\ 0 & 0 &\sigma^2_3& \ldots & 0\\\ldots&\ldots&\ldots &\ldots&0\\0&0&0&0&\sigma^2_n\end{bmatrix} $$ Then you can more efficiently estimate $\boldsymbol{\beta}$ using generalized least squares.

The estimator $\hat{\mathbf{b}}$ for GLS is given by:

$$\hat{\mathbf{b}} = \left(X'\Omega^{-1} X \right)^{-1}\left(X'\Omega^{-1} \mathbf{y} \right) $$

The basic idea with GLS is to give observations that are more precisely observed higher weight. The danger of this approach of course is that if $\Omega$ is not correct, you can end up with something far worse than the equal weighting of regular OLS.

Note also that weighted least squares (as what would occur for this $\Omega$), is a special case of GLS.

If you just want to use OLS estimation but calculate standard errors assuming you know $\mathrm{Var}\left(\boldsymbol{\epsilon}\right)$

\begin{align*} \mathrm{Var}\left( \hat{\mathbf{b}}_{OLS} \right) &= \mathrm{Var}\left( \left(X'X \right)^{-1}X'\left(X\mathbf{\beta} + \boldsymbol{\epsilon} \right) \right)\\ &=\left(X'X \right)^{-1}X' \mathrm{Var}\left( \boldsymbol{\epsilon} \right) X \left(X'X \right)^{-1} \\ &=\left(X'X \right)^{-1}X' \Omega X \left(X'X \right)^{-1} \end{align*}

The variance of the GLS estimator is given by:

\begin{align*} \mathrm{Var}\left( \hat{\mathbf{b}}_{GLS} \right) &= \mathrm{Var}\left( \left(X'\Omega^{-1} X \right)^{-1}X'\Omega^{-1}\left(X\mathbf{\beta} + \boldsymbol{\epsilon} \right) \right)\\ &=\left(X'\Omega^{-1}X \right)^{-1}X'\Omega^{-1} \mathrm{Var}\left( \boldsymbol{\epsilon} \right) \Omega^{-1} X \left(X'\Omega^{-1}X \right)^{-1} \\ &=\left(X'\Omega^{-1}X \right)^{-1}X'\Omega^{-1} \Omega\Omega^{-1} X \left(X'\Omega^{-1}X \right)^{-1} \\ &=\left(X'\Omega^{-1}X \right)^{-1}X'\Omega^{-1} X \left(X'\Omega^{-1}X \right)^{-1}\\ &=\left(X'\Omega^{-1}X \right)^{-1} \end{align*}

Is the variance of the weighted least squares estimator the following? $$ Var(\hat{b}) = (X^\top \Omega^{-1} X)^{-1} X^\top \Omega^{-1} Var(\epsilon) \Omega^{-1} X(X^\top \Omega^{-1} X)^{-1} = (X^\top \Omega^{-1} X)^{-1} $$ — angryavian, Oct 08 '16 at 05:02
@angryavian Yes, my typos should be fixed and the variance is given in my answer now too. — Matthew Gunn, Oct 08 '16 at 05:11
I think if you have a ["square root"](http://stats.stackexchange.com/a/238977/127790) so that $\Omega^{-1}=W^TW$, then I *think* the GLS problem for $(X,y,\Omega)$ is equivalent to an OLS problem for $(\hat{X},\hat{y})=(WX,Wy)$. — GeoMatt22, Oct 08 '16 at 05:15
@GeoMatt22 Yes, it seems to be discussed on the [Wikipedia page](https://en.wikipedia.org/wiki/Generalized_least_squares#Properties). — angryavian, Oct 08 '16 at 05:16
@matthew I had some typos of my own as well, sorry. Thank you so much for your thorough explanations/references! Very much appreciated. — angryavian, Oct 08 '16 at 05:17

Least squares regression when data has error bars

1 Answers1

If you just want to use OLS estimation but calculate standard errors assuming you know $\mathrm{Var}\left(\boldsymbol{\epsilon}\right)$