Wald test and LRT arriving at different conclusions

Question

Say I have two logistic models, the null model ($\omega_0$) and a model with one covariate ($\omega_1$). That is \begin{align} \omega_0: \quad \text{logit}(p_i) &= \beta_0 \\ \omega_1: \quad \text{logit}(p_i) &= \beta_0 + \beta_1 x_i \end{align} where $p_i = P(Y_i = 1)$ for some $Y_i \sim \text{Bernoulli}(p_i)$.

Shouldn't the Wald test for $\beta_1$ give the same result as the LRT for these nested models? I mean, aren't the null and the alternative hypothesis for these two tests the same? For the Wald test we have that \begin{equation} H_0: \beta_1 = 0 \quad vs. \quad H_1: \beta_1 \neq 0, \end{equation} and for the LRT we have the test statistic $D(\omega_0) - D(\omega_1)$ and the hypotheses \begin{align} &H_0: \omega_0 \quad &vs. \quad &H_1: \omega_1 \\ &H_0: \beta_1 = 0 \quad &vs. \quad &H_1: \beta_1 \neq 0 \end{align} In my analysis I've observed that the p-value of the Wald test is $0.04$ and the LRT gives a p-value of $0.07$. That is, with a cut-off of $0.05$, I arrive at different conclusions. I would expect that these two tests were asymptotically equivalent as I do have enough data.

So what is going on? Have I misunderstood the hypotheses in these two tests? Or is it just a coincidence that the two tests give p-values on either side of my cut-off level?

EDIT The vector of observations $\mathbf{y}$ and the corresponding covariate vector $\mathbf{x}$ both have $1.7 \times 10^6$ elements. Hence why I believe that the asymptotic results should apply.

However, I've found that there are a about 1800 cases where $y_i= 1$, and about 3000 cases where $x_i = 1$. Both $y_i$ and $x_i$ record very rare events that occur over a large time interval, hence why the vectors $\mathbf{y}$ and $\mathbf{x}$ have $1.7 \times 10^6$ elements. I realise that there are very few 1's compared to 0's in both $\mathbf{y}$ and $\mathbf{x}$. How does this effect the asymptotics?

Possible duplicate: http://stats.stackexchange.com/questions/193643/likelihood-ratio-vs-wald-test — jwimberley, Dec 28 '16 at 14:05
Thanks for your comment, @jwimberley. My understanding of that question and its answers is that the difference was due to a relatively small sample size, when the normality of the MLEs can be questionable. But I believe that that doesn't apply in my case. So are there any other reasons, beside the data size, that could cause this difference? Most importantly, have I understood the hypotheses of these two tests correctly? — harisf, Dec 28 '16 at 14:14
@Scortchi $Y_i$ and $x_i$ are large binary sequences with $1.7 \times 10^{6}$ elements. Hence the reason I believe that asymptotic results apply. Would you say that's fair? Or since $Y_i$ is a binary sequence, and $n_i = 1$, I shouldn't view this as an asymptotic case? — harisf, Dec 28 '16 at 14:30
The assymptotics also depend on the number of 0s and 1s and the variance of the predictor. — Maarten Buis, Dec 28 '16 at 16:13
@harisf, possibly. What is x? What is the proportion y=0 vs 1? They should be equal 'at infinity', & 1.7x10^6 is awfully large, but it isn't quite infinite. See also: [Why do my p-values differ between logistic regression output, chi-squared test, and the confidence interval for the OR?](http://stats.stackexchange.com/q/144603/7290) — gung - Reinstate Monica, Dec 28 '16 at 16:13
@gung so I've found that there are a about 1800 cases where y = 1, and about 3000 cases where x = 1. Both y and x record very rare events that occur over a large time interval, hence why these vectors have $1.7 \times 10^6$ elements. I realise that there are very few 1's compared to 0's in both y and x. How does this effect the asymptotics? — harisf, Dec 29 '16 at 14:10
That's valuable information about your situation, @harisf. Please edit that into the body of the question, so it isn't lost down in the comments. — gung - Reinstate Monica, Dec 29 '16 at 19:21
Is this question a duplicate of http://stats.stackexchange.com/questions/48206/likelihood-ratio-test-or-z-test ? — Jeremy Miles, Dec 30 '16 at 18:07
@JeremyMiles yes! Thank you for linking that question. The accepted answer of that question states that if the Wald test shows a significant effect, and the LRT doesn't, it may be due to a non-parabolic likelihood. What if it's the other way around? That is, what if the LRT shows a significant effect, and the Wald test doesn't? — harisf, Jan 01 '17 at 17:07
This could have to do with the Hauck-Donner effect, search this site — kjetil b halvorsen, Sep 08 '17 at 20:46

Wald test and LRT arriving at different conclusions

0 Answers0