24

I realize that this is a very basic question, but I can't find an answer anywhere.

I'm computing regression coefficients using either the normal equations or QR decomposition. How can I compute standard errors for each coefficient? I usually think of standard errors as being computed as:

$SE_\bar{x}\ = \frac{\sigma_{x}}{\sqrt{n}}$

What is $\sigma_{x}$ for each coefficient? What is the most efficient way to compute this in the context of OLS?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Belmont
  • 1,273
  • 3
  • 12
  • 16
  • The computation of the variance of the regression coefficients is detailed in [this Wikipedia Page](https://en.wikipedia.org/wiki/Proofs_involving_ordinary_least_squares#Least_squares_estimator_for_%CE%B2) – Clement H. Jan 28 '20 at 09:55

1 Answers1

21

When doing least squares estimation (assuming a normal random component) the regression parameter estimates are normally distributed with mean equal to the true regression parameter and covariance matrix $\Sigma = s^2\cdot(X^TX)^{-1}$ where $s^2$ is the residual variance and $X^TX$ is the design matrix. $X^T$ is the transpose of $X$ and $X$ is defined by the model equation $Y=X\beta+\epsilon$ with $\beta$ the regression parameters and $\epsilon$ is the error term. The estimated standard deviation of a beta parameter is gotten by taking the corresponding term in $(X^TX)^{-1}$ multiplying it by the sample estimate of the residual variance and then taking the square root. This is not a very simple calculation but any software package will compute it for you and provide it in the output.

Example

On page 134 of Draper and Smith (referenced in my comment), they provide the following data for fitting by least squares a model $Y = \beta_0 + \beta_1 X + \varepsilon$ where $\varepsilon \sim N(0, \mathbb{I}\sigma^2)$.

                      X                      Y                    XY
                      0                     -2                     0
                      2                      0                     0
                      2                      2                     4
                      5                      1                     5
                      5                      3                    15
                      9                      1                     9
                      9                      0                     0
                      9                      0                     0
                      9                      1                     9
                     10                     -1                   -10
                    ---                     --                   ---
Sum                  60                      5                    32
Sum of  Squares     482                     21                   528

Looks like an example where the slope should be close to 0.

$$X^t = \pmatrix{ 1 &1 &1 &1 &1 &1 &1 &1 &1 &1 \\ 0 &2 &2 &5 &5 &9 &9 &9 &9 &10 }.$$

So

$$X^t X = \pmatrix{n &\sum X_i \\ \sum X_i &\sum X_i^2} = \pmatrix{10 &60 \\60 &482}$$

and

$$\eqalign{ (X^t X)^{-1} &= \pmatrix{ \frac{\sum X_i^2}{n \sum (X_i - \bar{X})^2} &\frac{-\bar{X}}{\sum (X_i-\bar{X})^2} \\ \frac{-\bar{X}}{\sum (X_i-\bar{X})^2} &\frac{1}{\sum (X_i-\bar{X})^2} } \\ &= \pmatrix{\frac{482}{10(122)} &-\frac{6}{122} \\ -\frac{6}{122} &\frac{1}{122}} \\ &= \pmatrix{0.395 &-0.049 \\ -0.049 &0.008} }$$

where $\bar{X} = \sum X_i/n = 60/10 = 6$.

Estimate for $β = (X^TX)^{-1} X^TY$ = ( b0 ) =(Yb-b1 Xb) b1 Sxy/Sxx

b1 = 1/61 = 0.0163 and b0 = 0.5- 0.0163(6) = 0.402

From $(X^TX)^{-1}$ above Sb1 =Se (0.008) and Sb0=Se(0.395) where Se is the estimated standard deviation for the error term. Se =√2.3085.

Sorry that the equations didn't carry subscripting and superscripting when I cut and pasted them. The table didn't reproduce well either because the spaces got ignored. The first string of 3 numbers correspond to the first values of X Y and XY and the same for the followinf strings of three. After Sum comes the sums for X Y and XY respectively and then the sum of squares for X Y and XY respectively. The 2x2 matrices got messed up too. The values after the brackets should be in brackets underneath the numbers to the left.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • I've actually also wondered of what's happening behind the curtains, could you possibly give an example with a matrix of 3x3? – Max Gordon May 07 '12 at 05:02
  • 2
    Not meant as a plug for my book but i go through the computations of the least squares solution in simple linear regression (Y=aX+b) and calculate the standard errors for a and b, pp.101-103, The Essentials of Biostatistics for Physicians, Nurses, and Clinicians, Wiley 2011. a more detailed description can be found In Draper and Smith Applied Regression Analysis 3rd Edition, Wiley New York 1998 page 126-127. In my answer that follows I will take an example from Draper and Smith. – Michael R. Chernick May 07 '12 at 15:53
  • 1
    To Bill's comment below: Thanks very much Bill. It is much more time consuming to operate on this website the way the moderators want us to compared any of the many discussion groups I post on. Just coming up with good posts takes up enough time. i don't plan to do much extra. I also don't expect people to fix up the equations for me. So in the future I will just try to avoid equations or use them only in attachments (if I can figure out how to do that easily). – Michael R. Chernick May 07 '12 at 21:01
  • 10
    When I started interacting with this site, Michael, I had similar feelings. With experience, they have changed. It's worthwhile knowing some $\TeX$ and once you do, it's (almost) as fast to type it in as it is to type in anything in English. I also learned, by studying exemplary posts (such as many replies by @chl, cardinal, and other high-reputation-per-post users), that providing *references,* clear *illustrations,* and well-thought out *equations* is usually highly appreciated and well received. High quality is one thing distinguishing this site from most others. – whuber May 07 '12 at 21:19
  • 2
    That is all nice Bill and it is nice that so many people are dedicated to give those high quality posts. I may use Latex for other purposes, like publishing papers. But I don't have the time to go to all the effort that people expect of me on this site. i am not going to invest the time just to provide service on this site. – Michael R. Chernick May 07 '12 at 21:42
  • 2
    I agree with @whuber, once you are fluent in $\TeX$, writing equations takes no more time than writing sentences. For me, it's not as much about performing a service for the sake of everyone else but about pride in how my posts look. Not to mention, this level of fluency with $\TeX$ helps me (and probably many of the regulars on this site) in my work regularly, so practicing it is useful in its own right. I'd also say that I'm much more likely to read an answer that has shown the slightest bit of effort with regard to formatting, and I'm probably not alone there. – Macro May 08 '12 at 03:12
  • 2
    I get your point. I think I am raising a bigger issue. This is just one of many things about this site that requires those posting to put in extra time and effort. Why should we be expected to do this? – Michael R. Chernick May 08 '12 at 03:36
  • 1
    I have been on as a member for less than one week and have already amassed over 900 reputation points. That could only happen because I went crazy spending an inordinate amount of time readin questions, commenting on questions, answering questions and writing questions. Yes I get some satisfaction that people like my posts and reward me with points and I am glad that I can help people. But let's not forget that our time is valuable and this work is voluntary. – Michael R. Chernick May 08 '12 at 03:38
  • 1
    I have clients that pay me $200 for an hour of my services and I am sure that most others on this site are devoting valuable time to this. So I don't understand why we would think it is important to spend even more of our time learning Latex and other things to pretty up our posts or to give very detailed answers that we continually edit when we find ways to improve it. – Michael R. Chernick May 08 '12 at 03:38
  • 5
    I think the disconnect is here: "This is just one of many things about this site that requires those posting to put in extra **time** and effort" - @whuber and I are both saying that it, in fact, does not take extra time if you know how to do it. We don't learn $\TeX$ so that we can post on this site - we (at least I) learn $\TeX$ because it's an important skill to have as a statistician and happens to make posts much more readable on this site. – Macro May 08 '12 at 11:05
  • 3
    Like many of the people on here, yes, I work as a statistician, but I also happen to find it fun - this site is recreational for me and it's a nice bonus that others find some of my posts useful. If you find marking up your equations with $\TeX$ to be work and don't think it's worth learning then so be it, but know that some of your content will be overlooked. – Macro May 08 '12 at 11:16