6

Searching through Wikipedia and StackExchange I managed to understand that, for a set of $N$ normally distributed values, the unbiased variance $\textrm{Var}[s^2]$ of the unbiased variance $s^2$ of the distribution is given by

$$\textrm{Var}[s^2] = \frac{2\sigma^4}{N - 1},$$

where $\sigma^2$ is the biased variance of the normal distribution. Since its square root is a measure of the deviation of $s^2$ from the expected value of its distribution, is it correct to say that the deviation of $s = \sqrt{s^2}$ from its expected value is, according to error propagation,

$$\delta s = \frac{\sigma}{\sqrt{2(N -1)}} \quad ?$$

EDIT: From this answer, it seems that the deviation from the deviation is

$$ \delta s = \sigma \sqrt{ 1 - \frac{2}{N-1} \cdot \left( \frac{ \Gamma(N/2) }{ \Gamma( \frac{N-1}{2} ) } \right)^2 }; $$

since this result is rather convincing, where's the fault in my own reasoning?

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
  • See https://stats.stackexchange.com/questions/393316/do-error-bars-on-probabilities-have-any-meaning/393336#393336 – kjetil b halvorsen Mar 05 '19 at 17:47
  • I know that error bars on error bars are redundant, but that's not the point. I'm implementing a blocking procedure in a Monte Carlo simulation, and since I can't just guess autocorrelation time by eye, the only way out I can see is to evaluate the errors on the 'measured' quantities - namely, the errors on errors. – Francesco Arnaudo Mar 06 '19 at 08:42

1 Answers1

5

You did nothing wrong: the first value of $\delta s,$ which was obtained by an approximate method, is a close approximation to the second.

Let's compare the two expressions by using Stirling's approximation

$$\log\left(\Gamma(z)\right) \approx z \log(z) - z + \log(2\pi)/2 - \log(z)/2 + \frac{1}{12z} + O(z^{-2}).$$

and the Taylor series approximation

$$\log\left(z+\frac{1}{2}\right) =\log(z) + \log\left(1 + \frac{1}{2z}\right)\approx \log(z) + \frac{1}{2z} - \frac{1}{8z^2} + O(z^{-3}).$$

(Eventually we will set $z=(n-1)/2,$ but in the meantime this notation is more convenient.) Carrying out all calculations modulo $O(z^{-2}),$ use these approximations to estimate

$$\log\left(\Gamma\left(z+\frac{1}{2}\right)\right) - \log\left(\Gamma\left(z\right)\right) \approx \frac{1}{2}\left(\log(z) - \frac{1}{4z}\right) + O(z^{-2}).$$

Consequently

$$f(z) = 1 - \frac{1}{z}\left(\frac{\Gamma\left(z+1/2)\right)}{\Gamma(z)}\right)^2 \approx 1 - \exp\left(-\frac{1}{4z}\right) = \frac{1}{4z} + O(z^{-2}).$$

The exact expression for $\delta s$ in the question is $\sigma\sqrt{f((n-1)/2)}$ whereas the approximation produced by first-order error propagation is $\sigma\sqrt{g((n-1)/2)}$ with

$$g(z) = \frac{1}{4z}.$$

Consequently their ratio is

$$\frac{\sigma\sqrt{g((n-1)/2)}}{\sigma\sqrt{f((n-1)/2)}} = \sqrt{\frac{g((n-1)/2)}{f((n-1)/2)}} = \sqrt{\frac{1/(2n-2) + O(n^{-2})}{1/(2n-2)}}=1+O(n^{-1}).$$

Thus, the two expressions closely agree for large $n.$ But just how large must $n$ be? Not very, as this plot of the ratio shows:

Figure graphing the ratio up to n=100

The first (approximate) formula is always too large, but the ratio rapidly drops toward $1:1$ as $n$ increases.

Indeed, further analysis indicates the relative error (the difference between the ratio and $1$) is $1/(8n)+O(n^{-2}).$ Even for $n=2$ this isn't bad.

whuber
  • 281,159
  • 54
  • 637
  • 1,101