7

The standard error is an approximation of the standard deviation of the sampling distribution of the sample means. The real standard deviation of the sampling distribution, $\sigma _{\bar x}$ is:

$$\sigma _{\bar x} = \frac{\sigma}{\sqrt{n}}$$

, where $n$ is the sample size and $\sigma$ is the standard deviation of the variable we are interested in. As $\sigma$ is unknown, we replace it by $s$, the standard deviation of our sample and this gives the standard error.

$$SE_{\bar x} = \frac{s}{\sqrt{n}}$$

Why do we use $s$, the sample variance, rather than the unbiased sample standard deviation $\frac{(n-1)s}{n}$? The unbiased sample standard deviation $\frac{(n-1)s}{n}$ would be a better estimation of the variance of the variable we are interested in, wouldn't it? Intuitively, I would rather calculate the standard error as being:

$$SE_{\bar x} = \frac{n\cdot s}{(n-1)\sqrt{n}} = \frac{s \sqrt{n}}{n-1}$$

Remi.b
  • 4,572
  • 12
  • 34
  • 64
  • 1
    A bit of terms. $s$ in your formula, _so called_ "sample sd" in argot, has denominator $n-1$ and is correctly named "unbiased estimate of population sd from the sample". So, it is what substitutes $\sigma$ since the latter is unknown. No need correcting it for the right d.f. – ttnphns Sep 26 '14 at 15:39
  • So $s = \frac{n-1}{n}\cdot \sqrt{\frac{1}{n}\sum (x_i-\bar x)^2}$, where all $x_i$ are the individuals in my sample and $\bar x$ is the mean of my sample. Is that right? $s$ is already the unbiased estimate from the sample. Ok that makes sense. My issue was just a matter of what symbols represent what. I guess you can post your comment as an answer. – Remi.b Sep 26 '14 at 15:46
  • $s$ is the sqrt of "sample variance" (more properly called "unbiased estimate of population variance") which is computed on d.f. $n-1$ because we rely on sample's _mean_ as if on the true (unknown) population mean. – ttnphns Sep 26 '14 at 15:52
  • So, does it mean that $s=\sqrt{\frac{n-1}{n^2}\sum{(x_i-\bar x)^2}}$? mmmhhh, I'm kinda lost here! can you please give me the formulas to calculate $s$ from the sample data? – Remi.b Sep 26 '14 at 15:55
  • 2
    @ttphns Even with $n - 1$ on the bottom "unbiased" is true only for the variance, not the SD. Still, we don't usually bother with a correction factor. – Nick Cox Sep 26 '14 at 16:14
  • @ttnphns In fact we have a thread [about why the bottom is unbiased for the variance but not the SD](http://stats.stackexchange.com/questions/11707/why-is-sample-standard-deviation-a-biased-estimator-of-sigma) - the short answer being Jensen's inequality, but the long answer is quite interesting also. – Silverfish Jan 05 '15 at 22:18

1 Answers1

7

The $n$ in $\sigma/\sqrt{n}$ has nothing to do with how you estimate $\sigma$. It has to do with the fact that the average of $n$ iid random variables $X_i$ has variance $\sigma^2/n$ when $\mbox{Var}(X_i) = \sigma^2$.

If $\sigma$ is unknown, you estimate it using $s = \sqrt{\frac1{n-1}\sum (X_i-\bar X)^2}$, so that your estimate of the standard error is $$ \hat{SE}(\bar X) = \sqrt{\frac{\sum(X_i-\bar X)^2}{n(n-1)}} $$

Russ Lenth
  • 15,161
  • 20
  • 53