5

I have come across the following statement:

$(*)$ The width of CLT-based 99% confidence intervals is $6\sigma n^{-1/2}$.

How does one derive this? Is there a general formula?

I tried to check it myself:

The CLT states that a sum $S_n$ of iid rv's approaches the following distribution.

$$p(S_n = s) = \frac 1{2\pi n\sigma^2}\mbox{exp}\left(-\frac{(s-n\mu)^2}{2n\sigma^2}\right)$$

Now as I understand it, $(*$) is basically saying that $$p(S_n\leq -3\sigma n^{-1/2}) + p(S_n \geq 3\sigma n^{-1/2}) = 0.01$$ where I have assumed (wlog?) $\mu = 0$. Thus I was hoping that I might find $$\int_{-\infty}^{-3\sigma n^{-1/2}}p(s)ds = 0.005$$ where $n$ and $\sigma$ have magically cancelled out. However, this doesn't seem to be the case.

Can someone explain?

akkarin
  • 191
  • 4
  • 2
    You're on the right track, but the statement is incorrect, unless you generously read "$6$" as being an approximation to $5$! The correct value to use in place of $6$ is $\Phi^{-1}(1-.01/2)-\Phi^{-1}(.01/2) \approx 5.152$, where $\Phi$ is the cumulative standard Normal distribution function. Most likely the statement was intended for small samples (where $\Phi$ must be replaced by a Student $t$ distribution), but then the applicability of the CLT becomes least plausible. – whuber Apr 13 '17 at 15:45
  • 1
    And the CLT may not be accurate enough for some distributions when n < 100,000. Not very safe to use the CLT very often. Among other things the CLT assumes that you have the right measure of dispersion. SD is not valid for many distributions (e.g., log-normal). – Frank Harrell Apr 13 '17 at 15:59
  • @akkarin Can you give a source for the claim? – Glen_b Apr 13 '17 at 22:45
  • Thank you for your answers! @whuber: So my idea is right and if I replace 6 by the correct number, the $n$ will somehow cancel out? I find it hard to do this practically. Glen_b: It was from my lecture notes, maybe I got it wrong. – akkarin Apr 18 '17 at 15:34

1 Answers1

1

Presumably, a "CLT-based confidence interval" of confidence $1-\alpha$ for a sample $x_1,\ldots, x_n$ drawn randomly from a distribution with mean $\mu$ and variance $\sigma^2$ means the interval for $\mu$ having endpoints $\bar x \pm Z_{\alpha/2} s/\sqrt{n}$ where $\bar x$ is the sample mean, $s^2$ is the unbiased estimator of $\sigma^2,$ and $Z_{\alpha/2}$ is the $\alpha/2$ quantile of the standard Normal distribution.

This interval, as a function of the sample, also is random; but we may evaluate its expected width,

$$E\left[|(\bar x - Z_{\alpha/2}s/\sqrt{n}) - (\bar x + Z_{\alpha/2}s/\sqrt{n})|\right] = 2 Z_{\alpha/2}\,E[s]/\sqrt{n}.$$

We know $E[s]$ is finite (because $E[s^2]=\sigma^2$ is assumed finite for the CLT) and underestimates $|\sigma|$ (by virtue of Jensen's Inequality). For guidance, note that when the distribution truly is Normal, the bias in this estimate is of the order $1/n.$ For anything but the smallest sample sizes, then, we may take $E[s]\approx \sigma$ in this analysis.

Consequently, the expected width of the confidence interval is $2 |Z_{\alpha/2}\,\sigma|/\sqrt{n}.$

When $1-\alpha=0.99,$ $\alpha/2 = 0.005$ and $Z_{\alpha/2} \approx -2.58,$ giving an expected width of $5.16\,\sigma.$ Anyone wishing to state a conservative rule of thumb will wish to overestimate this result and might want to add another fudge factor for all the approximations implied by use of the CLT and replacing $E[s]$ by $\sigma.$ They will also want to introduce simple, memorable coefficients. These considerations lead to replacing $5.16$ by the next largest integer, yielding the statement in the question.

whuber
  • 281,159
  • 54
  • 637
  • 1,101