Least squares regression with variance-covariance matrix of observations

Question

I'm trying to fit a model of the form $Y=aX+b$ based on a number of $(X,Y)$ observations with non-independent errors in $Y$. I know the variance-covariance matrix of the errors on $Y$.

How can I compute best-fit parameters $(a,b)$ and their var-cov matrix?
Can a least squares regression take non-independent errors into account?
Is there a more classical method to solve this kind of problem?

would it be fair to say you have something of the form $Y = X\beta + \varepsilon$ where $\varepsilon \sim \mathcal N(0, \Omega)$ with $\Omega$ known but not diagonal? Or are the errors both dependent and non-gaussian? At the very least, regardless of the actual distribution of $\varepsilon$, are you able to say $E(\varepsilon) = 0$ and $Var(\varepsilon) = \Omega$ with $\Omega$ known? — jld, Mar 14 '18 at 19:40
If you mean $Y=\beta X+\epsilon$ in vector space (thus $\beta=(a,b)$), yes. And yes, I can assume that my errors are gaussian. — Mathieu, Mar 14 '18 at 19:54
Possible duplicate of [How to do regression with known correlations among the errors?](https://stats.stackexchange.com/questions/69785/how-to-do-regression-with-known-correlations-among-the-errors) — jld, Mar 14 '18 at 20:07
Thanks, that was very helpful. I'm still unsure how to compute the var-cov matrix of the best-fit parameters. A quick Wikipedia search was unsuccessful. Can you please point me in the right direction? — Mathieu, Mar 14 '18 at 20:35
when $\Omega$ is known and not estimated, the GLS estimator $\hat \beta$ is of the form $MY$ for some known and nonrandom matrix $M$, so it'd just be $Var(\hat \beta) = M Var(Y) M^T = M \Omega M^T$, and then plug in $M = (X^T \Omega^{-1} X)^{-1}X^T \Omega^{-1}$ to get $Var(\hat \beta) = (X^T \Omega^{-1} X)^{-1}$ and note how this is a direct extension of the variance of $\hat \beta$ when the errors are homoscedastic — jld, Mar 14 '18 at 20:39
That's what I suspected by analogy with the homoscedastic case, but I needed confirmation. Is there any way for you to format this as an answer, so that I can accept it? Thanks again. — Mathieu, Mar 14 '18 at 20:47

jld · Accepted Answer · 2018-03-14T21:11:13.867

$\newcommand{\e}{\varepsilon}$Let $Y = X\beta + \e$ where $\e \sim \mathcal N(0, \Omega)$ with $\Omega$ known. Assume that $\Omega$ is not singular. Then there exists a matrix $L$ such that $L = \Omega^{-1/2}$ and we can multiply through our equation by $L$ to get $$ LY = LX\beta + L\e $$ and now $L\e \sim \mathcal N(0,I)$. Because we're left multiplying we're not changing $\beta$.

Since our errors are now iid we can happily apply the usual OLS procedure to get $$ \hat \beta = \left((LX)^T(LX)\right)^{-1}(LX)^T(LY) = (X^T \Omega^{-1} X)^{-1}X^T \Omega^{-1} Y. $$

This is generalized least squares. This answer covers exactly the point estimation of $\hat \beta$ but didn't seem to have the variance so I'll derive that next.

Because we are not estimating $\Omega$ we have $$ E(\hat \beta) = (X^T \Omega^{-1} X)^{-1}X^T \Omega^{-1} X \beta = \beta $$ and $$ Var(\hat \beta) = \left((X^T \Omega^{-1} X)^{-1}X^T \Omega^{-1}\right)Var(Y)\left((X^T \Omega^{-1} X)^{-1}X^T \Omega^{-1}\right) $$ $$ = (X^T \Omega^{-1} X)^{-1} $$ so all together $$ \hat \beta \sim \mathcal N\left(\beta, (X^T \Omega^{-1} X)^{-1} \right) $$ since $\hat \beta$ is still a linear transformation of a Gaussian RV (i.e. $Y$).

Often GLS is done with $\Omega = \sigma^2 V$ where $V$ is known but $\sigma$ isn't, but here I've assumed the entire thing is known exactly.

Least squares regression with variance-covariance matrix of observations

1 Answers1

Linked