4

I am currently going through "Introductory Econometrics - A modern approach" by Wooldridge and have a question about the standard error formula.

The textbook gives the following equations as an estimator of the standard error in the case of simple linear regression where $y_i = \beta_0 + \beta_1 x$:

$$SE(\hat\beta_1) = \frac{\hat \sigma }{\sqrt {SST_x}} $$

where:

$$\hat \sigma = \sqrt \frac{SSR}{n-2}$$

$$SST_x = \sum_{i=1}^n (x_i - \bar x)^2$$

By using the above formulas I was able to derive the correct estimate of the standard error as reported by stats packages, however I came across another formula, which is:

$$SE = \frac{\sigma}{\sqrt n}$$

Can someone please explain the link between the two, as I was not able to find any material related to their dependency, only separate explanations.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
Serge Kashlik
  • 441
  • 1
  • 3
  • 8
  • 1
    Is it possible that last formula is giving the *population* standard error. $\sigma$ has no hat. $\hat{\sigma}$ would denote an estimated standard deviation, $\sigma$ could denote the population (true / unknown) standard error). – Paul Hewson Dec 02 '19 at 14:24
  • 1
    Please say more about where you found the second formula, with a link to it if possible. It looks like the formula for the standard error of the mean of $n$ observations rather than the standard error for a coefficient in linear regression. – EdM Dec 02 '19 at 14:55
  • 1
    @Edm yes, the second formula represents the standard error of the mean. I was wondering if there is any relation between the two. – Serge Kashlik Dec 02 '19 at 14:59

2 Answers2

2

Apart from the hat over the $\sigma$ in the first instance, these are examples of a common formula. In both cases there is a square root of a fraction; the numerator is a variance $\sigma^2$ or its estimated value $\hat \sigma^2;$ and the denominator--as it turns out--can be understood as the squared length of a vector of explanatory values in the simplest kind of regression model. Compare the red equalities in the two highlighted equations below.


Consider the model $$y_i = \beta x_i + \varepsilon_i\tag{1}$$ where $\beta$ is to be estimated from data $(x_i,y_i)$ and the $\varepsilon_i$ are assumed to be uncorrelated zero mean random variables all of variance $\sigma^2$ (which is not known). The Ordinary Least Squares estimate is

$$\hat \beta = \frac{\sum_i x_i y_i}{\sum_i x_i x_i} = \frac{\sum_i x_i y_i}{|x|^2}$$

(using a simplified vector notation for the sum of squares of the $x_i$ in the denominator, which we may interpret as the squared Euclidean length of the vector $(x_i)$). Because

$$\operatorname{Var}(y_i) = \operatorname{Var}(\beta x_i + \varepsilon_i) = \operatorname{Var}(\varepsilon_i) = \sigma^2$$

and the covariances of distinct $y_i$ and $y_j$ are zero, compute that

$$\operatorname{Var}(\hat\beta) = \operatorname{Var}\left(\frac{\sum_i x_i y_i}{\sum_i x_i x_i}\right) = \sum_i \left(\frac{x_i}{|x|^2}\right)^2 \sigma^2 = \frac{|x|^2}{\left(|x|^2\right)^2}\sigma^2 = \frac{\sigma^2}{|x|^2}.\tag{2}$$

In the special case where $x_i=1$ for all $i,$ $|x|^2 = \sum_i 1^2 = n$ and the model is

$$y_i = \beta + \varepsilon_i$$

with

$$\hat \beta = \frac{\sum_i (1)y_i}{|x|^2} = \frac{\sum_i y_i}{n} = \bar y,$$

whence

$$\operatorname{Var}(\bar y) = \color{red}{\operatorname{Var}(\hat\beta) = \frac{\sigma^2}{|x|^2}} = \frac{\sigma^2}{n}.$$

Taking square roots gives the second formula in the question. Bear in mind the origin of the denominator $n:$ it is the squared length of the vector of explanatory variables $(x_i = 1).$


The first formula arises by fitting the model

$$y_i = \alpha + \beta x_i + \varepsilon_i = \alpha z_i + \beta x_i + \varepsilon_i$$

(where $z_i=1$ for all $i$) in two steps. In the first step, both $y$ and $x$ are fit to $z$ using the simple model $(1)$ and then are replaced by their residuals. (Please see https://stats.stackexchange.com/a/46508/919 and https://stats.stackexchange.com/a/113207/919 for the justification and explanations of this fundamental step, which is called "controlling for" or "taking out the effect of" the variable $z.$)

In other words, $y_i$ is replaced by $y_{\cdot i}=y_i - \bar y$ and $x_i$ is replaced by $x_{\cdot i}=x_i - \bar x.$ Because this removes all the discernible effects of $z,$ $\alpha$ is no longer needed and we are left to fit the model

$$y_{\cdot i} = y_i - \bar y = \beta (x_i - \bar x) + \varepsilon_i =\beta x_{\cdot i} + \varepsilon_i.$$

This, too is in the form of model $(1).$ Formula $(2)$ states

$$\color{red}{\operatorname{Var}(\hat \beta) = \frac{\sigma^2}{|x_\cdot|^2}} = \frac{\sigma^2}{\sum_i \left(x_i - \bar x\right)^2}.$$

Taking square roots gives the first formula in the question, except here we are using $\sigma$ instead of $\hat \sigma.$

This leads us to the last unresolved issue: when you know (or assume the value of) $\sigma,$ there's nothing left to do: we have our standard errors of estimate. But when you don't know $\sigma,$ about the only thing you can do (short of an infinite regress where you try to estimate the standard error of $\hat \sigma$ and so on) is to replace the occurrence of $\sigma^2$ in formula $(2)$ by its estimate $\hat\sigma^2.$

whuber
  • 281,159
  • 54
  • 637
  • 1,101
1

Any statistic, a "quantity computed from values in a sample", can have a standard error. The standard error of a statistic is "the standard deviation of its sampling distribution or an estimate of that standard deviation".* That is, if you repeated the same experiment a large number of times, the standard error provides a measure of the reproducibility of the value of the computed statistic.

The last formula you wrote, $SE = \frac{\sigma}{\sqrt n}$, is most strictly the standard error of the mean value (SEM) for samples of size $n$ of a single variable that has a true standard deviation $\sigma$ of its values in the population from which you are sampling. More typically, you have an estimate $s$ of the standard deviation based on your sample,** and calculate $SEM=\frac{s}{\sqrt n}$. (I prefer to use $SEM$ for standard errors of mean values of single variables, and reserve $SE$ for standard errors of other statistics.)

In a simple linear regression as in your first equation you have two variables of interest, whose jointly observed values provide the statistic of the estimated slope, $\hat \beta_1$ in your nomenclature. You can write this sample-based estimate as proportional to the ratio of the standard errors of the means of the y and x values, with the proportionality constant equal to their sample correlation coefficient.

With respect to the standard error of the slope estimate, note that you could choose to write $\sqrt {SST_x}$ as $\sqrt n SEM_x$ (where $SEM_x$ is the standard error of the mean of the $x$ values). Then you could write:

$$SE(\hat\beta_1) = \frac{\sqrt {SSR}}{\sqrt {n (n-2)} SEM_x} $$

which shows that (at constant SSR, sum of squares of residuals) the standard error of your estimate of the slope is lower if the distribution of $x$ values, represented by $SEM_x$, is wider. (That's why in experimental design it can be helpful to arrange for a wide range of values for the independent variable $x$.) Other than that, however, there is no simple general dependency between the standard error of the estimate of the slope in simple linear regression and the standard errors of the $x$ or $y$ values separately. What matters is the linear relationship between $y$ and $x$ and how successfully that relationship leads to small residuals, as represented by $SSR$.


*Sometimes you need to read carefully to infer whether an author is describing a true population value or a sample-based estimate.

**Sample-based estimates are often distinguished by a "hat" symbol, like $\hat \sigma$, but $s$ has long-standing use to represent a sample-based standard deviation for values of a single variable.

EdM
  • 57,766
  • 7
  • 66
  • 187