1

I have a question on the distribution of betas in a multiple linear regression scheme

The estimated parameter vector is $\hat{\beta}=(X^′X)^{−1}X^′y$ where $X = [1 \; \;x]$ is the $n \times 2$ data matrix.

Substitute $X \beta + \epsilon$ for y.

Calculate $\text{var}(\hat{\beta})=\text{var}[(\beta+(X^′X)^{−1}X^′\epsilon)]$

Using this relation, how do we get that

$$\hat{\text{var}}[\hat{\beta}]=[(X^′X)^{−1}/(N-p-1)]\sum_i(e_i^2),$$

where $e=y-X\hat{\beta}$?

mpiktas
  • 33,140
  • 5
  • 82
  • 138
bgbgh
  • 11
  • 2
  • In particular, I read that the degrees of freedom are N-p-1 where p is the number of parameters, how do we get this? – bgbgh Sep 25 '11 at 22:11
  • 1
    Regarding degrees of freedom, look at [this CV Q&A](http://stats.stackexchange.com/questions/884/what-are-degrees-of-freedom) or [this nice explanation](http://www.jerrydallal.com/LHSP/dof.htm) referred to in one of the answers. – Karl Sep 25 '11 at 23:41
  • Re-write your variance as $\hat{\text{var}}[\hat{\beta}]=[(X^′X)^{−1}/(N-p-1)]\sum_i(e_i^2)=\hat{\sigma}^2(X^′X)^{−1}$. So it seems like your question is more related to why do we estimate $\sigma^2$ as $\frac{1}{N-p-1}\sum_ie_i^2$? – probabilityislogic Sep 26 '11 at 21:44

1 Answers1

3

It doesn't help much to substitute $X\beta + \epsilon$ for $y$.

The key is that $(X^'X)^{-1} X^'$ is fixed, and $\text{var}(y|X) = \sigma^2 \; I$, and we use the fact that $\text{var}(Az) = A\; \text{var}(z) \; A^'$, so:

$$\text{var}[\hat{\beta}] = \text{var}[(X^'X)^{-1}X^'y] = (X^'X)^{-1}X^'\text{var}[y]X(X^'X)^{-1} = \sigma^2(X^'X)^{-1}$$

and then we estimate $\sigma^2$ by $\sum_i e_i^2/(n-2)$.

[$n-p-1$ reduces to $n-2$ when the number of non-intercept columns is $p=1$.]

Karl
  • 5,957
  • 18
  • 34
  • Thanks a lot for your reply Prof. Broman. Two follow up questions, please forgive my ignorance 1)How can we estimate sigma, by just calculating the error variance 2) What it the notion of degrees of freedom, why is is n-p-1 – bgbgh Sep 26 '11 at 00:43
  • Regarding degrees of freedom, look at the links in my comment under your question. Regarding estimating $\sigma^2$, what don't you understand? – Karl Sep 26 '11 at 01:12
  • Sure, so the variance of y can be calculated from the data set right, as sum(y-ybar)^2/n-1. – bgbgh Sep 26 '11 at 01:48
  • where ybar is the mean of the values of y. How do the error give the variance estimate. In general that leads to a more theoretical argument. When we postulate a linear model, say y = Xbeta + e (1), lets say that here beta are the population measure for the coefficients and hence their variances are all zero. Using OLS we end up estimating betas, say the estimators are beta'. – bgbgh Sep 26 '11 at 01:48
  • So now we have the equation y = Xbeta' + e' (2). Note that this e' is different from e, above. Is this reasoning sound?.In equation (1) V(y) = V(e), since V(Xbeta) = 0, where V is variance. – bgbgh Sep 26 '11 at 01:48
  • In equation (2), V(y) = V(Xbeta') + V(e'). None the less, we can always estimate – bgbgh Sep 26 '11 at 01:49
  • V(y) directly from the data set as V(y) = sum(y-ybar)^2/n-1. – bgbgh Sep 26 '11 at 01:49
  • Please let me know if you had questions, thanks again! – bgbgh Sep 26 '11 at 01:50
  • But you want $\text{var}(y|x)$ not the unconditional variance. – Karl Sep 26 '11 at 02:26
  • Ok thanks, I guess the idea is that x is held constant? But do you in general agree with the equations 1 and 2. 1 being the equation with the population beta parameters and 2 being the OLS estimators. I guess we can realistically only use 2 to estimate the V(y|x) which really is V(e') as per equation 2. Does that seem reasonable? – bgbgh Sep 26 '11 at 11:05
  • @bgbgh - perhaps turn that into another question – Karl Sep 26 '11 at 11:08
  • @Karl - i think you mean $var(\epsilon)=\sigma^2\bf{}I$ – probabilityislogic Sep 26 '11 at 21:42
  • @probabilityislogic - Indeed, I edited it to be $\text{var}(y|x)$. If you think of x as fixed, then $\text{var}(y) = \text{var}(y|x) = \text{var}(\epsilon)$. But this seems to be a key issue underlying the question. – Karl Sep 26 '11 at 21:55