7

I'm using the standard formula for the confidence interval for standard deviation:

SQRT(((N-1)*s^2)/(CHISQ.INV((1-Confidence)/2, N-1)))

By mistake I entered a very low confidence level and I noticed that the resulting interval did not include the calculated standard deviation. This happens when the confidence is sufficiently low (e.g. 10%).

Can someone explain me the reason? I've double checked the results with Minitab and they are correct.

amoeba
  • 93,463
  • 28
  • 275
  • 317
Matteo
  • 71
  • 1
  • For any confidence interval (even when the coverage probability is low) the true value of the parameter has a possibility of not being included in a single observed interval. This probability increases as the converge probability decreases. – Michael R. Chernick Apr 04 '17 at 16:46
  • 3
    @Michael the question doesn't ask about the *true* sd value not being in the interval. The question asks about the *sample* sd being outside the interval. Your comment doesn't relate to what the question is asking. – Glen_b Apr 04 '17 at 16:48
  • 1
    This formula is derived in an answer at http://stats.stackexchange.com/a/76446/. The details--and the subsequent comment thread--point out some of the simplifying assumptions that were made, giving us some hints concerning why and how this interval would fail to cover the estimate. – whuber Apr 04 '17 at 16:59

1 Answers1

10

[Aside: We're using a chi-squared distribution to obtain the confidence interval because this interval is obtained assuming we're sampling from a normal distribution.]

While the interval for $\sigma$ not including the observed sample value of $s$ might at first glance seem surprising, it occurs for the simple reason that the distribution of a chi-squared random variable doesn't have its median at its degrees of freedom (that's where its expected value is, but the distribution is skewed -- the median is below the d.f.). For example, with $\nu=3$, the expected value ($3$) is nearly at the 61st percentile.

Now for $\alpha$ near 1, $\alpha/2$ and $1-\alpha/2$ will both be very close to $\frac12$ and so the corresponding percentage points of a chi-square will be very close to the median. Consequently, if $\alpha$ is large enough (i.e. if the coverage of the interval, $1-\alpha$, is small enough), both percentage points can turn out to be below the mean, as in this example:

density of a chi-squared with 3 d.f. showing a symmetric 5% interval, which lies entirely below the mean, 3

In the particular example above, the $0.475$ quantile and the $0.525$ quantile each cut off half of the tail area that totals $0.95$, leaving $0.05$ between those endpoints. We see that both quantiles are well below $\nu=3$ (which is way up past the $0.60$ quantile -- even a 20% CI would have this issue).

As a result of that, the chi-square percentage points divided by the degrees of freedom can both be below 1. If that happens, this makes both ends of the interval for $\sigma$ larger than $s$ (the sample standard deviation).

In more detail --

The quantiles $\chi^2_{\alpha/2}/\nu$ and $\chi^2_{1-\alpha/2}/\nu$ are both below $1$. The interval for $\sigma^2$ is $(r_1 s^2,r_2 s^2)$ where $r_1=\frac{\nu}{\chi^2_{1-\alpha/2}}$ and $r_2=\frac{\nu}{\chi^2_{\alpha/2}}$ are the reciprocals of those quantities that are below $1$ (making the $r_i>1$).

This means that the interval for $\sigma^2$ lays entirely above $s^2$ ... and the interval for $\sigma$ is obtained by taking square roots of those limits, so the interval for $\sigma$ also doesn't include $s$.

Considered more directly the interval for $\sigma$ is $(\sqrt{r_1} s,\sqrt{r_2} s)$ -- and both $\sqrt{r_i}$ values are in turn greater than $1$, so the interval for $\sigma$ also doesn't include $s$.

So while the expected value of $s^2$ is $\sigma^2$, $s^2$ is typically smaller than $\sigma^2$ (because the distribution of $s^2$ is skewed right), correspondingly you'd expect a very narrow interval for $\sigma^2$ to sit above the observed $s^2$. This carries through to the interval for $\sigma$.


We should tend to see similar effects occur with other intervals that result from skewed distributions.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • +1 but can't one construct CI around the mean, instead of around the median? Then this problem would be avoided. – amoeba May 22 '17 at 08:23
  • 1
    @amoeba This interval is not constructed "around the median" but constructed to have the same proportion in each tail (in effect from-the-ends-in, rather than from-the-median-out). You're right that it's not necessary to construct intervals with the same proportion in each tail; but that's what it seems was being done in the question. – Glen_b May 22 '17 at 12:31