Confusion in linear regression confidence interval calculation

Question

Possible Duplicate:
Calculating the confidence interval for simple linear regression coefficient estimates

I was referring to this wiki article http://en.wikipedia.org/wiki/Simple_linear_regression where it tried to calculate the confidence interval of the regression parameters

$$ \hat\alpha(intercept) $$ and

$$ \hat\beta(slope) $$

where

$$ (\alpha) (\beta) \\ $$ are the true population parameters. It then calculates the variance of beta as

$$ \sigma^2/\sum(x_i-x_m)^2\\ $$ where $$\sigma^2$$ is the variance of error term. I didn't get how this variance was calculated. Also I didn't get the rest of the derivation for the confidence interval. Any suggestions?

It would be preferable to incorporate this into your previous question. Have you searched the site for duplicate and related questions? — cardinal, Jun 30 '12 at 22:13
Please update your previous question. I'm closing this one as a duplicate. — chl, Jul 01 '12 at 10:07

score 0 · Answer 1 · answered Jun 30 '12 at 22:14

0

The variance of the residuals is just the sum of the squared residuals (you don't need to subtract off the mean since the mean is already 0) divided by $n-p$ where p is the number of parameters estimated in the regression to get the residuals (2 if you estimated an intercept and one slope). In math notation:

$\hat\sigma^2 = \frac1{n-p}\sum r_i^2$

The general formula for the confidence interval (Wald interval) is the parameter estimate plus and minus a table value (that brings in the confidence level) times the standard error of that parameter. So you need a table value which will just be the t-table value with $n-p$ degrees of freedom and the square root of the above variance as the standard error.

answered Jun 30 '12 at 22:14

Greg Snow

46,563
2
90
159

By variance I mean variance of beta not the residual variance – user34790 Jun 30 '12 at 23:10
The formula for the variance of beta is just what you have written above. Think back to the standard error of the mean for 1 sample cases. The variance version of that standard error is $\frac{\sigma^2}{n}$ The variance for beta is similar, but instead of dividing by $n$ the sample size the variation depends on the distribution of the predictor variable ($x$). If the x values are more spread out then there is less variability in the estimate, if the x values are all real close together then a small change in one y value will have a bigger effect on the estimate of the slope, hence the den. – Greg Snow Jul 01 '12 at 01:48
Yeah that's true but how do we know when to use which formula. I mean how this formula was derived? – user34790 Jul 01 '12 at 13:57
You can derive the formula by starting with the equation for the slope $\frac{ \sum{(x_i - \bar{x}) y_i} }{ \sum{(x_i-\bar{x})^2} }$ and taking the variance of that. The y is the only variable, everything else is fixed, so if you use the rules about the variance of a sum and the variance of a constant times your variable then you find $\sigma^2$ as the variance of the $y_i$'s and all the constant pieces cancel until you are left with just the denominator above. – Greg Snow Jul 02 '12 at 21:26
I don't get it how come the equation of the slope is that – user34790 Jul 03 '12 at 01:43
OK, what equation do you use for the slope? (there are many variations, so we might as well start with one you are comfortable with). – Greg Snow Jul 03 '12 at 16:35
I use (y2-y1)/(x2-x1) this simple one – user34790 Jul 03 '12 at 16:50
And what do you do when there is a 3rd point? That formula is fine if all you have is 2 points, but in that case it is difficult to do meaningful inference, you have no measure of the variation. If you have more than 2 data points and you only use the 1st 2 to calculate the slope then you are at the mercy of the ordering in the data set (and all formulas talked about so far don't work). If you don't already know a formula for the slope then you really need to spend some time with a good regression textbook/class, answeres here will not be enough. – Greg Snow Jul 03 '12 at 17:10

score 0 · Answer 2 · edited Jun 30 '12 at 22:53

0

Because $\hat\beta = (X^TX)^{-1} X^T Y$ and $\mathrm{Var}(CY)=C^2 \mathrm{Var}(Y)$ the $X$s appear in the denominator of the variance estimate.

edited Jun 30 '12 at 22:53

cardinal

24,973
8
94
128

answered Jun 30 '12 at 22:43

Michael R. Chernick

39,640
28
74
143

Your first expression has a term that's gone missing. This is also very good *intuition* but it's not really the *explanation*. – cardinal Jun 30 '12 at 22:49
Fixed. Because the β^1 component of the regression parameter as a linear combination of the ys the xs appear as they do in the formula. This is maybe handwavy rather than explicit but right nontheless. – Michael R. Chernick Jun 30 '12 at 23:03
Yes. I guess what I was getting at was that the "reason" the $X$s appear in the denominator is through the determinant when taking the inverse of $X^T X$. But, it is good intuition that the inverse of the matrix acts much like a division would in the case of real numbers. (Of course, there are algebraic notions at play here, but they're beyond what is necessarily to address the problem at hand.) – cardinal Jun 30 '12 at 23:30

Confusion in linear regression confidence interval calculation

2 Answers2