3

I am writing some code that uses a Least-Squares estimator. Of course, as mentioned HERE, the covariance matrix can be obtained from $\sigma^2(X'X)^{-1}$.

The problem I have, is that I have no idea how to get the constant factor. As the number of measurements goes up, the covariance seems to converge to the correct value, but of course, you never have infinite measurements.

So, how do I find the $\sigma^2$? I read that you can use the residual to estimate it, but I don't see the relationship.

Tetragramm
  • 31
  • 2
  • You shouldn't write "infinite measurements" if you mean _infinitely many measurements_. If you do three measurements, and each one of those is infinite (whatever that word might mean, as applied to a measurement) then you have infinite measurements, but you don't have infinitely many, since there are only three. – Michael Hardy Dec 28 '16 at 01:50
  • If you know that the sum of squares of residuals has expected value $(n-p) \sigma^2,$ where $n$ is the number of observations and $p$ is the rank of $X,$ then you have that sum of squares divided by $n-p$ as an unbiased estimator of $\sigma^2.$ Is that what you're asking about? $\qquad$ – Michael Hardy Dec 28 '16 at 01:52
  • You are correct, I did mean infinitely many measurements. And that's what I asked about, although it might not be what I want to know. See the comments on Matthew's answer. – Tetragramm Dec 29 '16 at 02:25

1 Answers1

2

Let $\hat{\sigma}^2$ be defined as:

$$ \hat{\sigma}^2 = \frac{1}{n-k} \sum_{i=1}^n e_i^2 $$ where $e_i = y_i - \mathbf{x}_i \cdot \hat{\boldsymbol{\beta}}$ is the residual for observation $i$, $k$ is the number of regressors (including the constant term), and $\hat{\boldsymbol{\beta}}$ is the OLS estimate for $\boldsymbol{\beta}$. Then under the OLS assumptions of linearity ($y_i = \mathbf{x}_i \cdot \boldsymbol{\beta} + \epsilon_i$), strict exogeneity ($E[\epsilon_i \mid X] = 0$), no multicollinearity, homoskedasticity, and no serial correlation (i.e. $E[\epsilon_i\epsilon_j] = 0$), then $\hat{\sigma}^2$ is a consistent, unbiased estimator for $\sigma^2$.


For reference, see Econometrics by Fumio Hayashi, Chapter 1.

Matthew Gunn
  • 20,541
  • 1
  • 47
  • 85
  • Perhaps I'm asking the wrong question, but the number I get from that equation is nowhere near what the $(X'X)^{-1}$ needs to be multiplied by to match the actual uncertainty. – Tetragramm Dec 29 '16 at 01:58
  • The result I get using that estimate is ~100, but I have to divide the $(X'X)^{-1}$ by ~5000 to match the actual uncertainty. For more information, I'm simulating data by adding gaussian noise to the truth for each measurement. I extract the $(X'X)^{-1}$ from the estimator, and compare it to a covariance from the estimated state and the true state. Furthermore, as I add more measurements, the $\sigma^2$ doesn't change much, but the constant factor between the estimator's covariance and the true covariance changes a lot. – Tetragramm Dec 29 '16 at 02:09