3

Suppose $\{f(\cdot,\theta) : \theta \in \mathbb{R}^p\}$ is a statistical model satisfying the conditions for Wilks' theorem, and that we have a hypothesis test of the form: $$H_0: \theta_p >0$$ $$H_1: \theta \in \mathbb{R}^p$$

Clearly, $H_0$ is a submodel of $H_1$ but it's dimension is not well defined. Does Wilks' theorem still apply and if so, how many degrees of freedom does the asymptotic $\chi^2$ distribution have?

If not, are there other asymptotic results that can be used instead?

qwerty
  • 33
  • 2

1 Answers1

2

Wilks' theorem basically says that any problem of this sort behaves asymptotically like a Gaussian location parameter, so let's look at a Gaussian location parameter for simplicity. Life is also simpler if you make the null $\theta_p\geq 0$ rather than $\theta_p>0$, so it's a closed subset (otherwise there may not be a constrained MLE under the null).

Start with $p=1$. We have $X\sim N(\mu,1)$ and we're testing the null $\mu>0$ vs the unrestricted alternative. The -2 log likelihood ratio (up to constants) is the residual sum of squares under the alternative ($D_A=\sum_{i=1}^n (X_i-\bar X)^2$) minus the same thing under the null ($D_0=\sum_{i=1}^n (X_i-\hat\mu)^2$, with $\hat\mu = \max (\bar X,0)$).

The distribution of $D_A-D_0$ under $\mu=0$ is $\chi^2_1$ if $\hat\mu<0$ and 0 (which you can think of as $\chi^2_0$) if $\hat\mu\geq 0$, so the overall distribution is $\frac{1}{2}\chi^2_1+\frac{1}{2}\chi^2_0$

With $p=2$ the same thing happens only more complicated. Each element of $\hat\mu$ is the corresponding element of $\bar X$ if that's positive, and otherwise is zero. Conditional on which quadrant $\bar X$ ends up in, the distribution of $D_A-D_0$ is $\chi_2$.

  • If $\bar X$ is in the (+,+) quadrant, then $\hat\mu=\bar X$ and $D_A-D_0=0$.
  • If $\bar X$ is in the (+,-) quadrant, then $\hat\mu=(\bar X_1, 0)$ and $D_A-D_0\sim\chi^2_1$.
  • If $\bar X$ is in the (-,+) quadrant, then $\hat\mu=(0,\bar X_1)$ and $D_A-D_0\sim\chi^2_1$.
  • If $\bar X$ is in the (-,-) quadrant, then $\hat\mu=(0,0)$ and $D_A-D_0\sim\chi^2_2$.

So, $D_A-D_0\sim \frac{1}{4}\chi^2_2+\frac{2}{4}\chi^2_2+\frac{1}{4}\chi^2_0$, and you can see how the pattern will generalise.

You get similar asymptotics for null hypotheses that are other intersections of half-planes, though in general it need not be easy to work out the mixing probabilities. This goes back to Chernoff (1954) for likelihood ratio tests, with more work by Self and Liang (1987) and extensions to M-estimation by Geyer (1994).

A real example where the mixing probabilities need to be computed by simulation arose in my MSc thesis (Lumley, 1995) testing for constant disease risk vs decreasing risk with distance from a specified point.

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73
  • Thank you for the response. How would this change if, for example, we had the null hypothesis restricted to [0,1], and the alternate hypothesis in [0,infinity)? – qwerty Apr 02 '21 at 14:31
  • 1
    With the same lower bound for null and alternative, the maximum likelihood ratio distribution won't be affected by the lower bound, so it's the same. It would be even be the same for a null of $[0,1]$ vs unrestricted alternative since asymptotically only one end of the null would matter. It's $p>1$ that's complicated – Thomas Lumley Apr 02 '21 at 21:53