Introduction to statistical learning Ch. 3 Pages 65-66

Question

In the textbook Introduction to Statistical Learning with Applications in R by James et al. (2014), the authors give the following formula for the standard error of the sample mean on page 65:

We have the well-known formula $$\text{Var}(\hat{\mu}) = \text{SE}(\hat{\mu})^2 = \frac{\sigma^2}{n}, \tag{3.7}$$ where $\sigma$ is the standard deviation of each of the realizations $y_i$ of $Y$.$^2$

The corresponding footnote states:

$^2$ This formula holds provided that the $n$ observations are uncorrelated.

I can't wrap my head around this, each $y_i$ has an exact value, how can it have a standard deviation?

On page 66, they then add:

$$\begin{align}&\text{SE}(\hat{\beta}_0)^2 = \sigma^2 \left[ \frac{1}{n} + \frac{\overline{x}^2}{\sum^n_{i=1}(x_i - \overline{x})^2} \right], \\ &\text{SE}(\hat{\beta}_1)^2 = \frac{\sigma^2}{\sum^n_{i=1} (x_i - \overline{x})^2}, \tag{3.8} \end{align}$$ where $\sigma^2 = \text{Var}(\epsilon)$

Is the $\sigma^2$ in equation $(3.8)$ the same as that in $(3.7)$?

Hi, could you format this post with LaTeX? Are you familiar with it? — user257566, Jul 14 '21 at 17:57
At https://stats.stackexchange.com/a/18609/919 I provide an elementary, intuitive, yet rigorous explanation of what a standard error is. — whuber, Jul 14 '21 at 18:04

score 0 · Answer 1 · answered Jul 14 '21 at 22:36

I can't wrap my head around this, each $y_i$ has an exact value, how can it have a standard deviation?

Because each $y_i$ is the realization of a random variable. Look at page 61: $Y\approx \beta_0+\beta_1 X$ says that $Y$ is approximately modeled as $\beta_0+\beta_1 X$, i.e. each $y_i$ will not be equal to $\beta_0+\beta_1 x_i$, it will be different, it will be $\beta_0+\beta_1 x_i+\epsilon_i$ (page 63) so it will not be an 'exact' value, it will be a random value because $\epsilon_i$ is a random variable.

If you are interested in knowing the population mean $\mu$ of some random variable $Y$, you can use $n$ observations from $Y$, $y_i,\dots,y_n$. Let's say that the random variable is $Y\sim\mathcal{N}(\mu,\sigma^2)$. The sample mean $\hat\mu=\frac1n\sum_{i=1}^n y_i$ can be used to estimate the population mean $\mu$ (which is unknown) but it is another random variable and one can show that $E[\hat\mu]=\mu$, $\text{Var}(\hat\mu)=\sigma^2/n$. Since each $y_i$ is a random value, its 'exact' value could be different (when you throw a die and get 4, 4 is an 'exact' value, but it could be 1, 2, 3, 5, or 6). Therefore you can guess that "a single estimate $\hat\mu$ may be a substantial underestimate or overestimate of $\mu$" (page 65). If your observations were an exact representation of $Y$, then $\hat\mu$ would always be equal to $\mu$.

However, "Equation 3.7 also tells us how this deviation shrinks with $n$—the more observations we have, the smaller the standard error of $\hat\mu$." (page 66).

Is the $\sigma^2$ in equation (3.8) the same as that in (3.7)?

Thay share the same role. One could say that in (3.7) the population model is $Y=\mu+\epsilon$ with $E[Y]=\mu$ and $\text{Var}(Y)=\text{Var}(\epsilon)=\sigma^2$, $$\begin{cases} Y=\mu+\epsilon \\ \epsilon\sim\mathcal{N}(0,\sigma^2) \end{cases}\quad\Leftrightarrow\quad Y\sim\mathcal{N}(\mu,\sigma^2)$$ in (3.8) the population model is $Y=\beta_0+\beta_1X+\epsilon$ with $E[Y]=\beta_0+\beta_1X$ and $\text{Var}(Y)=\text{Var}(\epsilon)=\sigma^2$, $$\begin{cases} Y=\beta_0+\beta_1X+\epsilon \\ \epsilon\sim\mathcal{N}(0,\sigma^2) \end{cases}\quad\Leftrightarrow\quad Y\sim\mathcal{N}(\beta_0+\beta_1X,\sigma^2)$$

Introduction to statistical learning Ch. 3 Pages 65-66

1 Answers1