Finding a critical region for a mixture

Question

Let $\ X_1 , X_2 $ be two iid random variables with normal N ( $\theta,1 $) distribution. Further , consider bernoulli random variable V with P(V=1) = $ \frac{1}{4} $ and which is independent of $\ X_1 , X_2 $ .

Define $ \ X_3 $ as = $\ X_1 $, if V=0 $\ X_2 $, if V=1

For testing hypothesis H0 : $ \theta =0 $ versus H1 : $ \theta =1 $

Reject H0 if $\frac{\ X_1 + X_2 +X_3 }{3} $ >C

Find C such that test size becomes 0.05 .

Now in this question I could understand only two things , one that V is an indicator variable and bernoulli distribution and second that it should be converted to Z before adding two critical region.

Other than that I dont know how to proceed , please help.

My attempt

$ \ X_3 = \ X_1*P(v=0) + \ X_2*P(v=1 ) $

$ E(\ X_3) = E(\ X_1*P(v=0)) + E(\ X_2*P(v=1 )) $

$ E(\ X_3) = (\theta *3/4) + (\theta*1/4) $

$ E(\ X_3) = \theta $

Now $ Var (\ X_3) = Var(\ X_1 *3/4) + var(\ X_2*1/4) $

$ Var (\ X_3) = \frac{9}{16} + \frac{1}{16} $

$ Var (\ X_3) = \frac{10}{16} $

$\frac{\ X_1 + X_2 +X_3 }{1} \ ~~~ N( 3\theta , ??) $

$\frac{\ X_1 + X_2 +X_3 }{3} \ ~~~ N( \theta , ??) $ Please help to proceed .

This exercise asks you to find an upper quantile of $Y=(X_1+X_2+X_3)/3.$ One straightforward approach, then, would begin by finding an expression for the CDF of $Y.$ What do you obtain? — whuber, Jul 12 '21 at 17:57
@simran You're mistaken; $X_3$ is not an indicator, nor does it have a Bernoulli distribution. $V$ is an indicator. — Glen_b, Jul 13 '21 at 02:44
@glen_b then what would be the distribution of X3 , can you please suggest a book on inference inference that could help to tackle such questions , and others also that I have posted , if you could visit my profile , please — simran, Jul 13 '21 at 03:37
I commented so that you would clarify/correct details in your question. Please do so. — Glen_b, Jul 13 '21 at 07:03
Sometimes one can succeed in course work by guessing a formula or result -- but in the real world, that is rarely effective. Thus, what is most important is to explain *why* you think the sum of the $X_i$ might have a particular distribution: how did you derive it? — whuber, Jul 13 '21 at 14:36
@whuber I have tried my best , could you please help to proceed — simran, Jul 14 '21 at 07:11
You are missing a step: how do you know $(X_1+X_2+X_3)/3$ has a Normal distribution? To appreciate that this issue is not entirely trivial, consider the version of this problem where $X_2\sim\mathcal{N}(\theta+10,1).$ — whuber, Jul 14 '21 at 13:42
@whuber because all three of them are normal so the sum will also be normal, — simran, Jul 14 '21 at 14:43
That's not a sufficient reason: you need all three to be *jointly* Normal. These are not jointly Normal! — whuber, Jul 14 '21 at 15:04

score 2 · Accepted Answer · answered Jul 14 '21 at 20:19

A mixture of identically distributed variables has the same distribution -- but is not independent of its component variables.

Some of the information in this exercise is just a distraction. Stripped to its essential underlying idea, the problem is the following:

Let $X_1,X_2$ be independent identically distributed random variables. Use an independent Bernoulli$(p)$ variable $V$ to define $X_3 = X_1$ when $V=0$ and otherwise $X_3=X_2.$ What is the distribution of $Y = X_1+X_2+X_3$?

The iid assumption implies $(X_1,X_2)$ and $(X_2,X_1)$ have the same distribution: that is, we may switch the roles of the $X_i.$ Upon doing this with the random variable $2X_1+X_2$ it is immediate that

$2X_1+X_2$ and $X_1 + 2X_2$ have the same distribution.

Let this common distribution function be $F.$ This means only that for any number $y,$

$$\Pr(2X_1+X_2 \le y) = F(y) = \Pr(X_1+2X_2 \le y.)$$

The distribution function of $Y$ is found by studying the two possibilities for $V,$ using the facts that when $V=0,$ $Y=2X_1+X_2$ and when $V=1,$ $Y=X_1+2X_2.$ The law of total probability asserts

$$\begin{aligned} \Pr(Y\le y) &= \Pr(V=0)\Pr(2X_1 + X_2 \le y) + \Pr(V=1)\Pr(X_1+2X_2\le y)\\ & = (1-p)F(y) + pF(y) \\ &= F(y). \end{aligned}$$

Thus,

$Y$ has the same distribution as $2X_1+X_2$ and $X_1 + 2X_2.$

The rest is mopping up: in your circumstance, where the $X_i$ have Normal distributions and are independent, $Y$ must therefore have a Normal distribution and its parameters are easily computed. From that it's simple to find the value of $C$ in the question.

The joint distribution of all the relevant variables is unusual and instructive. In this scatterplot matrix of 2000 realizations (with $\theta=5$) I have colored the points where $V=0$ in blue and those where $V=1$ in red. Only $X_1$ and $X_2$ are independent, as indicated by their circular-cloud scatterplots. The scatterplots with $Y$ include some singular parts on the diagonal, reflecting the fact that often $Y$ equals $2X_1+X_2$ or $X_1+2X_2.$

jassis · Answer 2 · 2021-07-14T18:35:36.797

1

I think that,

$$ X_3 = \cases{X_1, V = 0\\X_2, V = 1} $$

Therefore,

$$X_3 \sim N(\theta, 1)$$

Check if $X_3$ is independent of $X_1$ and $X_2$.

So what you have presented is the average of 3 iid observations of the variable $X \sim N(\theta, 1)$.

It is known that $\bar{x} \sim N(\theta, \frac{\sigma}{n})$. Which in this case $\bar{x} \sim N(\theta, \frac{1}{\sqrt{3}})$.

So I reject $H_0$ if $P(\bar{x} > c) = P(Z > \frac{c - \theta }{1/\sqrt{3}}) = P(Z > \sqrt{3}(c - \theta)) = 0.05$.

It follows then that

$$c - \theta = \frac{1.64}{\sqrt{3}} \Rightarrow c = \theta + \frac{1.64}{\sqrt{3}}$$

edited Jul 14 '21 at 18:35

answered Jul 14 '21 at 18:14

jassis

532
2
9

$X_3$ obviously is *not* independent of $X_1$ and $X_2:$ with 100% probability, it is equal to one or the other of them! – whuber Jul 14 '21 at 18:24
It makes sense @whuber. I did a simulation in R and noticed I got a wrong root. I have to think more about this problem. – jassis Jul 14 '21 at 18:55
A scatterplot matrix is informative. In `R`: `n – whuber Jul 14 '21 at 19:39

Finding a critical region for a mixture

2 Answers2