6

Regarding estimators of variance from a iid sample of size $n$, Karl Ove Hufthammer says Estimates of variance from an iid sample:

if they do have a normal distribution, dividing by n+1 (sic!) will give the lowest mean square error.

Why does it give the lowest mean square error?

What does he mean by "(sic!)"?

Thanks!

Tim
  • 1
  • 29
  • 102
  • 189
  • *sic* is Latin for 'thus', usually in text meaning "exactly as in the original" (even if that looks wrong, or unlikely, or questionable) – Glen_b Feb 02 '14 at 02:29
  • @Glen_b: I know that meaning of sic, but I don't understand what Karl means in his comment. – Tim Feb 02 '14 at 02:33
  • *That* it's reasonable to ask in comments at the original; it's a clarification of his meaning, not a whole new question. However, I may offer the suggestion that since $n+1$ looks rather like the more usual $n-1$ (or $n$ for that matter), one might be tempted to think he simply made a sign error or something, and *sic* is perhaps intended to emphasize that, no, even if it might appear possible that he meant $n-1$ or $n$, he really does mean $n+1$, just as originally typed. – Glen_b Feb 02 '14 at 02:38
  • You may be interested in this paper: http://www.tandfonline.com/doi/abs/10.1080/00031305.2012.735209?af=R – ekvall Feb 02 '14 at 07:49
  • @KarlOskar interesting; if I did it right, the estimator there corresponds to using $d = (n-1)\,+\,2/n$ in the notation of my answer. – Glen_b Feb 02 '14 at 22:45

1 Answers1

6

Define

$$s^2_d = \frac{\sum_{i=1}^n\left(x_i-\bar{x}\,\right)^2}{d}$$

The statistic $(n - 1)s_{n-1}^2 / \sigma^2$ follows a $\chi^2_{n-1}$ distribution. A $\chi^2_{n-1}$ has mean $n-1$ and variance $2(n-1)$. Hence $\text{E}(s_{n-1}^2) = \sigma^2$, and $\text{Var}(s_{n-1}^2) = 2\frac{\sigma^4 }{ n - 1}$.

Now $s_d^2 = \frac{n-1}{d} s_{n-1}^2$

$$\text{Bias}(s_d^2) = E(s_d^2)-\sigma^2 = \frac{n-1}{d}\sigma^2 -\sigma^2=\frac{n-1-d}{d}\sigma^2$$

$$\text {Var}(s_d^2) = 2\sigma^4(n - 1) / d^2$$

$$\text{MSE} = \text{Bias}^2 + \text{Var}$$

Hence

\begin{eqnarray} \text{MSE}(s_d^2) &=& \left(\frac{n-1-d}{d}\right)^2\sigma^4+ 2\sigma^4(n - 1) / d^2\\ &=&\sigma^4\frac{(n-1-d)^2+2(n-1)}{d^2}\\ &=&\sigma^4\,f(d)\,,\end{eqnarray}

where $f(d)=1+\frac{(n-1)^2 + 2(n-1)-2(n-1)d}{d^2}$

$f(d)$ is at a turning point when $f'(d)=0$

i.e. when $d(-2(n-1))-2((n-1)^2 + 2(n-1)-2(n-1)d) = 0$

which occurs when $d-2=n-1$ ... i.e. when $d=n+1$

Showing that's a minimum rather than a maximum (or indeed a horizontal point of inflexion) is straightforward, but I'll leave it at that for now.

The relevant Wikipedia page does it more generally, getting a formula in terms of the excess kurtosis (which gives the same result in this case). I may incorporate an outline of that derivation at some point.

Glen_b
  • 257,508
  • 32
  • 553
  • 939