4

Here is a little brainteaser:

Suppose we know $x|(\mu,\sigma^2) \sim N(\mu, \sigma^2)$ and that $\sigma^2$ can either take on the value 1 or 2 on any particular draw from the distribution (i.e., there is a 50/50 chance that it will be 1 or 2). The goal is to construct a confidence interval with exactly 95% coverage for $\mu$.

There are (at least) two angles of attack here:

  1. We know that the coverage of $x \pm 1.96 \times \sqrt{1}$ is too low and the coverage of $x \pm 1.96 \times \sqrt{2}$ is too high. But on any draw, we could randomly pick one or the other CI (i.e., we use a randomized CI). The problem is then figuring out with what probability we should pick the first or second interval, so that we end up with 95% coverage in the long run.
  2. We know that there is some value $\tilde{\sigma}^2$ between 1 and 2 that will give us the desired 95% coverage if we compute the CI with $x \pm 1.96 \times \sqrt{\tilde{\sigma}^2}$. The problem is then finding this $\tilde{\sigma}^2$ value.

And just in case: This is not homework. If you care to check my profile and go to my website, you'll find that my school days are long over. I actually just constructed this little exercise myself and figured others may enjoy trying to solve it. I'll post an answer in due time.

Wolfgang
  • 15,542
  • 1
  • 47
  • 74

1 Answers1

2

It's been a while since I've done things like this but I think you would argue as follows.

First, you're distribution is a finite mixture of two normals (for details: What is the variance of the weighted mixture of two gaussians?).

Following the logic there, we have $f(y) = .5f(X_1) + .5f(X_2)$ where Y is the mixture.

That means that $Var(Y) = p_1\sigma_1^2 + p_2\sigma_2^2 + p_1p_2(\mu_1-\mu_2)^2 = .5*1 + .5*2 + .25(\mu_1 - \mu_2)^2 = 1.5 + .25(\mu_1-\mu_2)^2 = 1.5$.

Similarly, $E(Y) = .5E(X_1) + .5E(X_2) = .5\mu + .5\mu = \mu$

So now we have the mean and the variance. However, we can NOT assume that a mixture of two normals is still normal -- and generally this is not the case.

Thus, I don't think your confidence bound for $x$ will be in the form of $x \pm 1.95\sqrt{1.5}$. Although using the central limit theorem, you can calculate the confidence bound for $\bar{x}$ as $\bar{x} \pm 1.96 \sqrt{1.5 /n}$.

As an example of the failure of normality, please see the below R simulation where I simulate data from the process you described. Even though if you hist(data) , the plot will look fairly normal, you can see from the shapiro wilk's test p-value that the data is not normal.

> data <- unlist(lapply(1:5000, function(x){
+   if(rbinom(1,1,.5)){
+     return(rnorm(n = 1, mean = 0, sd = 1))
+   }else{
+     return(rnorm(n = 1, mean = 0, sd = sqrt(2)))
+   }
+ }))
> 
> mean(data)
[1] 0.02801353
> var(data)
[1] 1.480294
> shapiro.test(data)

    Shapiro-Wilk normality test

data:  data
W = 0.99833, p-value = 3.78e-05
user1357015
  • 1,404
  • 5
  • 16
  • 26
  • You are correct in that the marginal distribution of $x$ is a mixture and it is certainly not normal. Still, one can construct a CI of the form $x \pm 1.96 \sqrt{\tilde{\sigma^2}}$ such that it will have exactly 95% coverage. – Wolfgang Nov 18 '16 at 19:11
  • @Wolfgang: I believe the 1.96 comes from the fact that you are using the normal distribution. If it's not normal, why would 1.96 still be valid? – user1357015 Nov 18 '16 at 19:14
  • The mixture is not normal, but conditional on $\sigma^2$, we still have normal distributions. So the trick is to use this fact. Then we can work out the coverage of $x \pm 1.96 \sqrt{1}$ and $x \pm 1.96 \sqrt{2}$ and try to use this for either solution 1) or 2). – Wolfgang Nov 18 '16 at 19:20