0

Possible Duplicate:
Calculating the confidence interval for simple linear regression coefficient estimates

I was referring to this wiki article http://en.wikipedia.org/wiki/Simple_linear_regression where it tried to calculate the confidence interval of the regression parameters

$$ \hat\alpha(intercept) $$ and

$$ \hat\beta(slope) $$

where

$$ (\alpha) (\beta) \\ $$ are the true population parameters. It then calculates the variance of beta as

$$ \sigma^2/\sum(x_i-x_m)^2\\ $$ where $$\sigma^2$$ is the variance of error term. I didn't get how this variance was calculated. Also I didn't get the rest of the derivation for the confidence interval. Any suggestions?

user34790
  • 6,049
  • 6
  • 42
  • 64
  • 1
    It would be preferable to incorporate this into your previous question. Have you searched the site for duplicate and related questions? – cardinal Jun 30 '12 at 22:13
  • Please update your previous question. I'm closing this one as a duplicate. – chl Jul 01 '12 at 10:07

2 Answers2

0

The variance of the residuals is just the sum of the squared residuals (you don't need to subtract off the mean since the mean is already 0) divided by $n-p$ where p is the number of parameters estimated in the regression to get the residuals (2 if you estimated an intercept and one slope). In math notation:

$\hat\sigma^2 = \frac1{n-p}\sum r_i^2$

The general formula for the confidence interval (Wald interval) is the parameter estimate plus and minus a table value (that brings in the confidence level) times the standard error of that parameter. So you need a table value which will just be the t-table value with $n-p$ degrees of freedom and the square root of the above variance as the standard error.

Greg Snow
  • 46,563
  • 2
  • 90
  • 159
  • By variance I mean variance of beta not the residual variance – user34790 Jun 30 '12 at 23:10
  • The formula for the variance of beta is just what you have written above. Think back to the standard error of the mean for 1 sample cases. The variance version of that standard error is $\frac{\sigma^2}{n}$ The variance for beta is similar, but instead of dividing by $n$ the sample size the variation depends on the distribution of the predictor variable ($x$). If the x values are more spread out then there is less variability in the estimate, if the x values are all real close together then a small change in one y value will have a bigger effect on the estimate of the slope, hence the den. – Greg Snow Jul 01 '12 at 01:48
  • Yeah that's true but how do we know when to use which formula. I mean how this formula was derived? – user34790 Jul 01 '12 at 13:57
  • You can derive the formula by starting with the equation for the slope $\frac{ \sum{(x_i - \bar{x}) y_i} }{ \sum{(x_i-\bar{x})^2} }$ and taking the variance of that. The y is the only variable, everything else is fixed, so if you use the rules about the variance of a sum and the variance of a constant times your variable then you find $\sigma^2$ as the variance of the $y_i$'s and all the constant pieces cancel until you are left with just the denominator above. – Greg Snow Jul 02 '12 at 21:26
  • I don't get it how come the equation of the slope is that – user34790 Jul 03 '12 at 01:43
  • OK, what equation do you use for the slope? (there are many variations, so we might as well start with one you are comfortable with). – Greg Snow Jul 03 '12 at 16:35
  • I use (y2-y1)/(x2-x1) this simple one – user34790 Jul 03 '12 at 16:50
  • And what do you do when there is a 3rd point? That formula is fine if all you have is 2 points, but in that case it is difficult to do meaningful inference, you have no measure of the variation. If you have more than 2 data points and you only use the 1st 2 to calculate the slope then you are at the mercy of the ordering in the data set (and all formulas talked about so far don't work). If you don't already know a formula for the slope then you really need to spend some time with a good regression textbook/class, answeres here will not be enough. – Greg Snow Jul 03 '12 at 17:10
0

Because $\hat\beta = (X^TX)^{-1} X^T Y$ and $\mathrm{Var}(CY)=C^2 \mathrm{Var}(Y)$ the $X$s appear in the denominator of the variance estimate.

cardinal
  • 24,973
  • 8
  • 94
  • 128
Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • Your first expression has a term that's gone missing. This is also very good *intuition* but it's not really the *explanation*. – cardinal Jun 30 '12 at 22:49
  • Fixed. Because the β^1 component of the regression parameter as a linear combination of the ys the xs appear as they do in the formula. This is maybe handwavy rather than explicit but right nontheless. – Michael R. Chernick Jun 30 '12 at 23:03
  • Yes. I guess what I was getting at was that the "reason" the $X$s appear in the denominator is through the determinant when taking the inverse of $X^T X$. But, it is good intuition that the inverse of the matrix acts much like a division would in the case of real numbers. (Of course, there are algebraic notions at play here, but they're beyond what is necessarily to address the problem at hand.) – cardinal Jun 30 '12 at 23:30