Deriving a confidence interval for the variance $\sigma^2$ of a non-normal distribution

Question

If $X_1, \cdots, X_n$ are normally distributed, the Student's Theorem asserts that the pivot variable $$\frac{(n-1)S^2}{\sigma^2}$$ is distributed $\chi^2(n-1)$. Then suppose I can find constants $a$ and $b$ such that $$P\left(a < \frac{(n-1)S^2}{\sigma^2} < b\right) = 1 - \alpha.$$ Then $$P\left((n-1)S^2/b < \sigma^2 < (n-1)S^2/a \right) = 1 - \alpha.$$ So my confidence interval is $((n-1)S^2/b, (n-1)S^2/a)$. But what if the $X_i$ are not normal? How can I derive a confidence interval for the variance?

If you know the form of the population distribution, you could get somewhere. Alternatively you might be able to set up a bootstrap interval. — Glen_b, Nov 08 '21 at 02:29
@Glen_b Could you please expand on that answer, if possible? — TheProofIsTrivium, Nov 08 '21 at 02:41
I was not so much answering the question as seeking clarification while offering some reasons for the need for it. Could you clarify your question so its clear which of those possibilities (or possibly, something else) an answerer should address? Do you have some other distributional model in mind or not? — Glen_b, Nov 08 '21 at 03:12
Is there no way to do this for any distribution with finite second moment, like you can for the mean using the CLT? — TheProofIsTrivium, Nov 08 '21 at 03:59
Oh you mean *asymptotically*? There's no hint of that in your question, the case you discuss there -- the chi-squared for the normal -- is small-sample, not asymptotic. Sure there's an asymptotic result ... one that depends on higher population moments. It's discussed in several posts on site already. If that's what you're after, fix your question -- but it might close as a duplicate. — Glen_b, Nov 08 '21 at 04:35
e.g. See discussion of an asymptotic distribution here: [Asymptotic distribution of sample variance of non-normal sample](https://stats.stackexchange.com/q/105337/805), for example; consequently (with some exceptions - note the issue mentioned there with the binomial with $p=\frac12$), an asymptotic test could be constructed if population kurtosis ($\mu_4/\sigma^4$) were known, or (by invoking additional theorems -- and relying on finiteness of moments up to the *eighth*), estimation of it. — Glen_b, Nov 08 '21 at 07:32
Ben suggests another asymptotic distribution here: [Sampling distribution of sample variance of non-normal iid r.v.s](https://stats.stackexchange.com/a/347476/805) based on the chi-squared rather than the normal. — Glen_b, Nov 08 '21 at 07:41

BruceET · Answer 1 · 2021-11-09T03:26:40.337

Following from @Glen_b's first comment, exact answers will depend on the distribution you have in mind. To pick an especially easy example, suppose we want to find a confidence interval for the variance $\sigma^2$ of an exponential population from which a sample of size $n = 100$ was randomly sampled.

Exact confidence interval for exponential variance. For an exponential distribution with rate $\lambda,$ the mean is $\mu = 1/\lambda$ and the variance is $\sigma^2 = \mu^2.$ Because $\frac{\bar X}{\mu} \sim\mathsf{Gamma}(n, n)$ it is easy to pivot to find an exact 95% CI for $\mu$ as follows: $\left(\frac{\bar X}{U},\,\frac{\bar X}{L}\right),$ where $L$ and $U$ cut probability $0.025$ from the lower and upper tails, respectively, of $\mathsf{Gamma}(n,n)$ [parameters shape and rate].

In particular, suppose $\lambda = 5, \mu = 0.2, \sigma^2 = 0.04$ and $n = 100.$ Then we use R to take such a sample:

set.seed(2021)
x = rexp(100, 5)
v.obs = var(x);  v.obs
[1] 0.04287759
a.obs = mean(x); a.obs^2
[1] 0.04004572

A good estimate of $\sigma^2$ is $\hat\sigma^2 = 0.04004572$ and the corresponding 95% CI of $\sigma^2$ is $(0.0276, 0.0603).$

mean(x)/qgamma(c(.975,.025),100,100)
[1] 0.1660300 0.2459494
CI = mean(x)/qgamma(c(.975,.025),100,100)
CI^2
       97.5%       2.5% 
0.02760683 0.06032230

Parametric bootstrap CI for exponential variance. In an analytically more intricate setting, we might know that $\sigma^2 = \mu^2$ (or some other function of $\mu)$ and that $\mu$ is a scale parameter of the population distribution. But we might not know the functional form of the population distribution.

Then we might use a 95% parametric bootstrap CI for $\mu$ to get such a CI for $\sigma^2.$ For our exponential data the parametric bootstrap CI is $(0.0276, 0.0605),$ in good agreement with the exact CI above.

set.seed(1108)
r.re = replicate(10^5, mean(rexp(100,1/a.obs))/a.obs)
UL = quantile(r.re, c(.975,.025))
CI.boot = a.obs/UL
    97.5%      2.5% 
0.1661410 0.2458883      # CI for pop mean
CI.boot^2
     97.5%       2.5% 
0.02760282 0.06046103    # CI for pop variance

Nonparametric bootstrap for exponential variance. In general, with considerably less information about the population distribution, we might do a 95% nonparametric bootstrap Ci for $\sigma^2$ by resampling from the data x without replacement and using differences involving the sample variance, to obtain $(0.0247, 0.0602).$

set.seed(1234)
d.re = replicate(2000, var(sample(x,100,rep=T)) - v.obs)
UL = quantile(d.re, c(.975,.025))
v.obs - UL
     97.5%       2.5% 
0.02466290 0.06021674    # CI for pop variance

Addendum: The same method works if you have a random sample y of size 100 from $\mathsf{Unif}(0,1).$ First, take the sample:

set.seed(1066)
y = runif(100)
vy.obs = var(y);   vy.obs;  1/12
[1] 0.07305236   # "unlucky" sample variance
[1] 0.08333333   # exact population variance

The 95% nonparametric bootstrap CI is $(0,0603, 0.0871),$ which does happen to include $\sigma^2 = 1/12 = 0.0833,$ even though we got a sample with variance $S^2 = 0.0730.$

set.seed(1776)
dy.re = replicate(2000, var(sample(y,100,rep=T)) - vy.obs)
ULy = quantile(dy.re, c(.975,.025))
vy.obs - ULy
     97.5%       2.5% 
0.06038059 0.08714299

Notice that the sample variance is not based on a sufficient statistic of a normal sample.

Deriving a confidence interval for the variance $\sigma^2$ of a non-normal distribution

1 Answers1