Why don't we use the unbiased sample variance to calculate the standard error?

Question

The standard error is an approximation of the standard deviation of the sampling distribution of the sample means. The real standard deviation of the sampling distribution, $\sigma _{\bar x}$ is:

$$\sigma _{\bar x} = \frac{\sigma}{\sqrt{n}}$$

, where $n$ is the sample size and $\sigma$ is the standard deviation of the variable we are interested in. As $\sigma$ is unknown, we replace it by $s$, the standard deviation of our sample and this gives the standard error.

$$SE_{\bar x} = \frac{s}{\sqrt{n}}$$

Why do we use $s$, the sample variance, rather than the unbiased sample standard deviation $\frac{(n-1)s}{n}$? The unbiased sample standard deviation $\frac{(n-1)s}{n}$ would be a better estimation of the variance of the variable we are interested in, wouldn't it? Intuitively, I would rather calculate the standard error as being:

$$SE_{\bar x} = \frac{n\cdot s}{(n-1)\sqrt{n}} = \frac{s \sqrt{n}}{n-1}$$

A bit of terms. $s$ in your formula, _so called_ "sample sd" in argot, has denominator $n-1$ and is correctly named "unbiased estimate of population sd from the sample". So, it is what substitutes $\sigma$ since the latter is unknown. No need correcting it for the right d.f. — ttnphns, Sep 26 '14 at 15:39
So $s = \frac{n-1}{n}\cdot \sqrt{\frac{1}{n}\sum (x_i-\bar x)^2}$, where all $x_i$ are the individuals in my sample and $\bar x$ is the mean of my sample. Is that right? $s$ is already the unbiased estimate from the sample. Ok that makes sense. My issue was just a matter of what symbols represent what. I guess you can post your comment as an answer. — Remi.b, Sep 26 '14 at 15:46
$s$ is the sqrt of "sample variance" (more properly called "unbiased estimate of population variance") which is computed on d.f. $n-1$ because we rely on sample's _mean_ as if on the true (unknown) population mean. — ttnphns, Sep 26 '14 at 15:52
So, does it mean that $s=\sqrt{\frac{n-1}{n^2}\sum{(x_i-\bar x)^2}}$? mmmhhh, I'm kinda lost here! can you please give me the formulas to calculate $s$ from the sample data? — Remi.b, Sep 26 '14 at 15:55
@ttphns Even with $n - 1$ on the bottom "unbiased" is true only for the variance, not the SD. Still, we don't usually bother with a correction factor. — Nick Cox, Sep 26 '14 at 16:14
@ttnphns In fact we have a thread [about why the bottom is unbiased for the variance but not the SD](http://stats.stackexchange.com/questions/11707/why-is-sample-standard-deviation-a-biased-estimator-of-sigma) - the short answer being Jensen's inequality, but the long answer is quite interesting also. — Silverfish, Jan 05 '15 at 22:18

score 7 · Accepted Answer · answered Sep 26 '14 at 16:03

The $n$ in $\sigma/\sqrt{n}$ has nothing to do with how you estimate $\sigma$. It has to do with the fact that the average of $n$ iid random variables $X_i$ has variance $\sigma^2/n$ when $\mbox{Var}(X_i) = \sigma^2$.

If $\sigma$ is unknown, you estimate it using $s = \sqrt{\frac1{n-1}\sum (X_i-\bar X)^2}$, so that your estimate of the standard error is $$ \hat{SE}(\bar X) = \sqrt{\frac{\sum(X_i-\bar X)^2}{n(n-1)}} $$

Why don't we use the unbiased sample variance to calculate the standard error?

1 Answers1