6

I am studying parameters generated by the method of Ordinary Least Squares, in particular, a parameter's associated standard error. Wikipedia suggests calculating

$$s^2=\frac{S\left(\hat\beta\right)}{n-p}=\frac{\rvert\rvert y-X\hat\beta\rvert\rvert^2}{n-p}$$

describing this as

the standard error of the regression (SER), or standard error of the equation (SEE).

This, however, is not sufficient to calculate the standard error for a given parameter.

I found this, which suggests

For each parameter, the standard error can be obtained by:

$$s_{\hat \beta _j}=s \sqrt{C_{jj}}$$

where $C_{jj}$ is the j$^{th}$ diagonal element of $(X'X)^{-1}$ (note that $(X'WX)^{-1}$ is used for weight fitting).

Note, that using this technique yields precisely the same standard errors as those summarized by the lm method in R.

Where does this $C$ come from? I have been reading about this for days, and this is the first reference I have seen to this matrix.

Joshua Cook
  • 328
  • 1
  • 3
  • 11
  • See http://stats.stackexchange.com/questions/135201, http://stats.stackexchange.com/questions/104704, and http://stats.stackexchange.com/questions/180478 – whuber Jun 05 '16 at 02:34

1 Answers1

13

In matrix notation we have data $\left (\mathbf y, \mathbf X\right)$ and we consider the model

$$\mathbf y = \mathbf X\beta + \mathbf u$$

where for the moment we only assume that the regressor matrix contains a series of ones, so that we can safely assume that the "error term" $\mathbf u$ has zero mean. We do not as yet make any statistical/probabilistic assumptions. Calculating the unknown betas by Ordinary Least Squares is a mathematical approximation method that needs no statistical assumptions. We obtain

$$\hat \beta = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf y$$

This is the (orthogonal) Linear Projection coefficient vector, and, as a mathematical approximation story, it stops here.

Now we want to talk about the "standard error" of the estimates. But that is a statistical concept, and so we must assume something random and probabilistic. Assume that the regressors are all deterministic, but $\mathbf u$ is a random variable. Due to the regressor matrix containing a series of ones, we then have $E(\mathbf u) = \mathbf 0$, where $E$ denotes the expected value. We have

$$\hat \beta = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf y = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\left(\mathbf X\beta + \mathbf u\right) = \beta +\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u$$

$$\implies \hat \beta -\beta = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u$$

Since $\beta$ is a constant, we have that $\text{Var}(\hat \beta -\beta) = \text{Var}(\hat \beta)$, where the Variance is the square of the standard deviation. The multivariate version of the variance, is

$$\text{Var}(\hat \beta -\beta) = E\Big[(\hat \beta -\beta)(\hat \beta -\beta)'\Big] - E\Big[(\hat \beta -\beta)\Big]E\Big[(\hat \beta -\beta)\Big]'$$

where the prime denotes the transpose. Substituting,

$$\text{Var}(\hat \beta) = E\Big[\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u\mathbf u'\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} \Big] - E\Big[\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u\Big]E\Big[\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf u\Big]'$$

Since we have assumed that $\mathbf X$ is deterministic, the expected value applies only to $\mathbf u$ so we have

$$\text{Var}(\hat \beta) = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'E(\mathbf u\mathbf u')\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} - \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'E(\mathbf u)E(\mathbf u)'\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} $$

Since $E(\mathbf u) =\mathbf 0$ we are left with

$$\text{Var}(\hat \beta) = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'E(\mathbf u\mathbf u')\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} $$

Now comes another benchmark statistical assumption: the $\mathbf u$ is "homoskedastic" which means

$$\text{Var}(\mathbf u) = E(\mathbf u\mathbf u') = \sigma^2I$$

where $\sigma^2 >0$ is the common variance of each element of the error vector, and $I$ is the identity matrix. Substituting we get

$$\text{Var}(\hat \beta) = \left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\sigma^2I\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1} =\sigma^2\left(\mathbf X' \mathbf X\right) ^{-1} \mathbf X'\mathbf X\left(\mathbf X' \mathbf X\right) ^{-1}$$

$$\text{Var}(\hat \beta) = \sigma^2\left(\mathbf X' \mathbf X\right) ^{-1}$$

The $\sigma^2$ is estimated by $s^2$ as in the OP question, and the diagonal elements of $\left(\mathbf X' \mathbf X\right) ^{-1}$, each multiplied by $s^2$, is the variance of the corresponding element of the estimated beta vector. Taking the square root leads to the standard error of each element. The off-diagonal elements of the matrix, multiplied by $\sigma^2$, give the estimated covariances between the elements of the beta vector.

Joshua Cook
  • 328
  • 1
  • 3
  • 11
Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241