How can the confidence interval for standard deviation not include the sample standard deviation?

Question

I'm using the standard formula for the confidence interval for standard deviation:

SQRT(((N-1)*s^2)/(CHISQ.INV((1-Confidence)/2, N-1)))

By mistake I entered a very low confidence level and I noticed that the resulting interval did not include the calculated standard deviation. This happens when the confidence is sufficiently low (e.g. 10%).

Can someone explain me the reason? I've double checked the results with Minitab and they are correct.

For any confidence interval (even when the coverage probability is low) the true value of the parameter has a possibility of not being included in a single observed interval. This probability increases as the converge probability decreases. — Michael R. Chernick, Apr 04 '17 at 16:46
@Michael the question doesn't ask about the *true* sd value not being in the interval. The question asks about the *sample* sd being outside the interval. Your comment doesn't relate to what the question is asking. — Glen_b, Apr 04 '17 at 16:48
This formula is derived in an answer at http://stats.stackexchange.com/a/76446/. The details--and the subsequent comment thread--point out some of the simplifying assumptions that were made, giving us some hints concerning why and how this interval would fail to cover the estimate. — whuber, Apr 04 '17 at 16:59

score 10 · Answer 1 · edited Jun 11 '20 at 14:32

[Aside: We're using a chi-squared distribution to obtain the confidence interval because this interval is obtained assuming we're sampling from a normal distribution.]

While the interval for $\sigma$ not including the observed sample value of $s$ might at first glance seem surprising, it occurs for the simple reason that the distribution of a chi-squared random variable doesn't have its median at its degrees of freedom (that's where its expected value is, but the distribution is skewed -- the median is below the d.f.). For example, with $\nu=3$, the expected value ($3$) is nearly at the 61st percentile.

Now for $\alpha$ near 1, $\alpha/2$ and $1-\alpha/2$ will both be very close to $\frac12$ and so the corresponding percentage points of a chi-square will be very close to the median. Consequently, if $\alpha$ is large enough (i.e. if the coverage of the interval, $1-\alpha$, is small enough), both percentage points can turn out to be below the mean, as in this example:

In the particular example above, the $0.475$ quantile and the $0.525$ quantile each cut off half of the tail area that totals $0.95$, leaving $0.05$ between those endpoints. We see that both quantiles are well below $\nu=3$ (which is way up past the $0.60$ quantile -- even a 20% CI would have this issue).

As a result of that, the chi-square percentage points divided by the degrees of freedom can both be below 1. If that happens, this makes both ends of the interval for $\sigma$ larger than $s$ (the sample standard deviation).

In more detail --

The quantiles $\chi^2_{\alpha/2}/\nu$ and $\chi^2_{1-\alpha/2}/\nu$ are both below $1$. The interval for $\sigma^2$ is $(r_1 s^2,r_2 s^2)$ where $r_1=\frac{\nu}{\chi^2_{1-\alpha/2}}$ and $r_2=\frac{\nu}{\chi^2_{\alpha/2}}$ are the reciprocals of those quantities that are below $1$ (making the $r_i>1$).

This means that the interval for $\sigma^2$ lays entirely above $s^2$ ... and the interval for $\sigma$ is obtained by taking square roots of those limits, so the interval for $\sigma$ also doesn't include $s$.

Considered more directly the interval for $\sigma$ is $(\sqrt{r_1} s,\sqrt{r_2} s)$ -- and both $\sqrt{r_i}$ values are in turn greater than $1$, so the interval for $\sigma$ also doesn't include $s$.

So while the expected value of $s^2$ is $\sigma^2$, $s^2$ is typically smaller than $\sigma^2$ (because the distribution of $s^2$ is skewed right), correspondingly you'd expect a very narrow interval for $\sigma^2$ to sit above the observed $s^2$. This carries through to the interval for $\sigma$.

We should tend to see similar effects occur with other intervals that result from skewed distributions.

+1 but can't one construct CI around the mean, instead of around the median? Then this problem would be avoided. — amoeba, May 22 '17 at 08:23
@amoeba This interval is not constructed "around the median" but constructed to have the same proportion in each tail (in effect from-the-ends-in, rather than from-the-median-out). You're right that it's not necessary to construct intervals with the same proportion in each tail; but that's what it seems was being done in the question. — Glen_b, May 22 '17 at 12:31

How can the confidence interval for standard deviation not include the sample standard deviation?

1 Answers1

Linked

Related