How many samples are needed to approximate the standard deviation?

Question

Assume you have a random variable $X$ for which you want to compute a $1/4$-approximate $\sigma'$ of $\sigma = \sqrt{\operatorname{Var}(X)}$ (i.e. $|\sigma - \sigma'| \leq \sigma/4$). You can only get samples drawn according to $X$ (you have no other information on $X$). My question is: how many samples do you need to achieve this task (with constant success probability)?

If the question was instead to get an estimate $\mu'$ of the expectation $\mu = E(X)$, then I can take $\mu'$ to be the average of $t$ samples, and the Chebyshev inequality gives $P\left(|\mu' - \mu| \geq \frac{\mu}{4}\right) \leq \frac{16 \sigma^2}{t \mu^2}$. Thus, $t = \frac{48\sigma^2}{\mu^2}$ samples will give me a $1/4$-approximate (with probability $2/3$). Is there a similar result for the standard deviation?

One problem with applying the Chebyshev inequality is that you know neither $\mu$ nor $\sigma$. The only conceivable reason to apply this inequality is that you know absolutely nothing about the distribution of $X$, so that puts you into a quandary. If you do know something about $X$--such as having a few preliminary samples or even a theoretical prediction--then it's almost always the case that the Chebyshev inequality will suggest taking far more samples than you really need. — whuber, Mar 02 '18 at 15:25
I am aware of the "weakness" of the Chebyshev inequality, but it gives an optimal time complexity for estimating $\mu$ when nothing is knownn on $X$, so I'm wondering if something similar exists for $\sigma$. — permanganate, Mar 02 '18 at 15:28
If all you're interested in is asymptotic timing, then $\sigma$ is no different than $\mu$. They are estimated in similar ways and the same principles apply. The main insight is that the variance of $\sigma$ depends on the fourth central moment of $X$. — whuber, Mar 02 '18 at 16:58
Even the normal distribution case $\sigma$ is biased and $Var(X)$ is not, see [link](https://stats.stackexchange.com/questions/249688/why-are-we-using-a-biased-and-misleading-standard-deviation-formula-for-sigma). So, any inequality containing $\sigma$ as opposed to $\sigma^2$ will be biased as a function of $n$. — Carl, Mar 02 '18 at 18:44
@Carl That makes no sense. You appear to confound the SD of a random variable, $\sigma$, with some particular estimator of it. — whuber, Mar 23 '18 at 13:50

How many samples are needed to approximate the standard deviation?

0 Answers0