Testing a nonstandard hypothesis: constructing test statistic, finding rejection region and obtaining $p$-value

Question

I have a sample of size $n=1$ (a single observation $x_1$) from a random variable $X\sim N(\mu,\sigma^2)$. The variance $\sigma^2$ is known, but the expectation $\mu$ is unknown. I would like to test the null hypothesis $$H_0\colon \quad \mu=0 \quad \text{or} \quad \mu=2$$ against the alternative $$H_1\colon \quad \mu=1 \quad \text{or} \quad \mu=3.$$ My significance level is $\alpha=0.05$.

Questions: How do I test this using the frequentist approach? Specifically:

What test statistic may I use?
How do I find the rejection region?
How do I calculate the $p$-value?

_{This is related to my continuing efforts to understand the $p$-value as in}

_{and my studies of likelihood ratio testing:}

Likelihood ratio test: https://en.wikipedia.org/wiki/Likelihood-ratio_test — stans, Jan 26 '21 at 08:36
@stans, looks like a great starting point! This should give me the test statistic in a straightforward way. But how do I find the rejection region and the $p$-value? Can I simply (i) find the likelihood maximizer among $\{0,2\}$, denote it $\hat\mu_0$, (ii) find the likelihood maximizer among $\{1,3\}$, denote it $\hat\mu_1$ and (iii) proceed as if $H_0\colon \mu=\hat\mu_0$ and $H_1\colon \mu=\hat\mu_1$? Could you recommend a textbook that introduces LR testing in an accessible way and treats cases like mine in addition to the simplest cases? — Richard Hardy, Jan 26 '21 at 08:50
I guess, you could simulate the distribution of the exact LR statistic under the conservative subcase of the null hypothesis. By "conservative" I mean the one which is harder to detect: $\mu = 2$. Then the rejection region is the right tail of the distribution of LR statistic... Regarding the ML estimates of $\mu$, they want you to find i) the estimate on $\{0, 2\}$ and ii) the estimate on $\{0, 1, 2, 3\}$... I learned LR test from Lehman's "Theory of Point Estimation" but there are probably friendlier books. Unfortunately, I do not know of a reference tackling your exact problem. — stans, Jan 26 '21 at 09:04
@stans, I suppose you did. What I was asking about is, suppose I first figure out which of $\{0,2\}$ maximizes the likelihood and which of $\{1,3\}$ does. Then I only keep the maximizers, discarding the other option in each case. Then I proceed as if my $H_0$ and $H_1$ contained only the maximizers, so they become point null and point alternative. From there, the remaining task of carrying an LR test is easy. I am mainly asking about the mechanics, i.e. do they work the way I am describing? The intuition and interpretation is another thing, I guess. — Richard Hardy, Jan 26 '21 at 09:11
Sorry, Lehman's "Testing Statistical Hypotheses", which is a sequel to "Theory of Point Estimation". — stans, Jan 26 '21 at 09:11
@stans, indeed, I was going to ask about that. In their point estimation book, there is hardly anyhing about LR testing. — Richard Hardy, Jan 26 '21 at 09:12
I do not think you are proposing the same thing as the kosher LR test. In your case, the LR statistic will stay the same but the perceived distribution of the LR statistic under null will be slightly different (off). The true distribution would have to account for both subcases of the alternative hypothesis. If $\sigma$ is large (subcases are hard to distinguish by MLE), this is likely to be an important issue. — stans, Jan 26 '21 at 09:16
@stans, this is slightly off topic, but I (naively) do not think it is possible to obtain the distribution of the test statistic under the null unless we specify a prior distribution for $\{0,2\}$. Am I mistaken? — Richard Hardy, Jan 26 '21 at 09:32
Well, if we assume $\mu = 2$, then we can simulate 1e6 samples of size $N$ by rnorm($N$ * 1e6) * $\sigma$ + $\mu$. Then on each sample we can perform two MLE routines and calculate the LM statistic according to the formula in the link (-2 * log-ratio). Then we have 1e6 simulated data points, which constitute the distribution used in p-value calculation... Or am I missing your point? — stans, Jan 26 '21 at 09:40
@stans, this distribution may be used in the $p$-value calculation (I am no expect in LR testing, so I cannot tell), but it is not the null distribution of the test statistic. It assumes $\mu=2$, but under the null this is not necessarily the case. The actual null distribution is a mixture of two normals with $\mu=0$ and $\mu=2$ with unknown weights. The weights would be known if we knew the prior on $\{0,2\}$ but we do not. — Richard Hardy, Jan 26 '21 at 09:56
$\mu = 2$ is a subcase of $H_0$ which is harder to distinguish from $H_1$. That is why it should be used for the purposes of assessing the distribution of the test statistic under $H_0$. This principle (of using the boundary of $H_0$ closest to $H_1$) is discussed by Lehman. Just like if you were to test $H_0: \mu\leq 5$ vs $H_1: \mu > 5$ you would calculate the p-value under $\mu = 5$. — stans, Jan 26 '21 at 10:02
@stans, I am not arguing against that, I just wanted to point out we cannot obtain the distribution of the test statistic under the null. I suppose it is only logical that we obtain something else instead, namely, the most useful one among the feasible substitutes. In any case, thank you a lot for your comments so far! It would be of great help if you found the time to write up an answer including some notes on my subquestions 2. and 3., alongside 1. that we have already discussed in some detail. — Richard Hardy, Jan 26 '21 at 10:12

Richard Hardy · Answer 1 · 2021-01-26T13:51:46.973

Some intuition can be gained from considering a likelihood ratio (LR) test. The following interactive graph shows how the LR varies with $\sigma$. Three snapshots are provided below.

The dotted red line is the likelihood under $H_0$, $\text{L}(x_1\mid \mu=0 \ \text{or} \ \mu=2)$.
The dashed blue line is the likelihood under $H_0 \cup H_1$, $\text{L}(x_1\mid \mu=0 \ \text{or} \ \mu=1 \ \text{or} \ \mu=2 \ \text{or} \ \mu=3)$.
The solid black line is the likelihood ratio, $\text{LR}(x_1\mid\dots) = \frac{ \text{L}(x_1\mid \mu=0 \ \text{or} \ \mu=2) }{ \text{L}(x_1\mid \mu=0 \ \text{or} \ \mu=1 \ \text{or} \ \mu=2 \ \text{or} \ \mu=3) }$.

The top figure is for $\sigma=0.25$. The middle figure is for $\sigma=0.5$. The bottom figure is for $\sigma=1$.

Observations

For small $\sigma$ we have good precision and the LR test has two rejection regions (RRs): around $1$ and well beyond $2$. When $\sigma$ grows, the RR around $1$ shrinks and eventually disappears (for a fixed significance level $\alpha$). The RR to the right of $2$ starts further and further away.
The precise boundaries of RRs depend on the significance level in addition to $\sigma$.
(I know I have specified $\alpha=0.05$, but this is a general comment.)
The $p$-value is the area under the LR curve where $x$ is such that $\text{LR}(x\mid\dots)\leq\text{LR}(x_1\mid\dots)$ for the particular $x_1$ that we have observed.

Testing a nonstandard hypothesis: constructing test statistic, finding rejection region and obtaining $p$-value

1 Answers1

Linked