Why is the P value from a binomial test sometimes not consistent with the confidence interval of the proportion?

Question

The results below don't really make sense. According to the p-value I should reject $H_0$, however, confidence interval contains the null value that I'm supposed to reject. Any ideas what's going on?

binom.test(51,235,(1/6),alternative="two.sided")

Exact binomial test

data:  51 and 235
number of successes = 51, number of trials = 235, p-value = 0.04375
alternative hypothesis: true probability of success is not equal to 0.1666667
95 percent confidence interval:
 0.1660633 0.2752684
sample estimates:
probability of success 
         0.2170213

See `?binom.test`, where in the Details section it is noted that "Confidence intervals are obtained by a procedure first given in Clopper and Pearson (1934). This guarantees that the confidence level is at least ‘conf.level’, but in general does not give the shortest-length confidence intervals." — Weihuang Wong, Jan 06 '17 at 22:52
@WeihuangWong: yes, I've noticed that. But whatever the procedure, I would still expect it to follow the Duality theorem stating that the CIs and p-values yield the same results. I.e., in our case the p-value being less 5% should bbe outside the 95% CI. Otherwise what's the point of such an interval? — Lola, Jan 07 '17 at 08:09
General discussion: http://stats.stackexchange.com/questions/169141 — amoeba, Jan 07 '17 at 16:18

Glen_b · Accepted Answer · 2017-01-07T09:25:19.913

6

In general there's no guarantee that results of hypothesis tests and inclusion in confidence intervals will agree.

In fact for the usual confidence intervals for proportions is one where it's easy to see.

Imagine the sample size is large enough that we can treat both the test and the confidence interval sensibly using a normal approximation. This doesn't mean that's how the intervals were calculated in your output (but this will be sufficient to explain how the issue can occur)

Now the standard error of a sample proportion is $s_p=\sqrt{\frac{p(1-p)}{n}}$ but the $p$ you use in that for the hypothesis test is the hypothesized value (i.e. $p_0=1/6$, giving $s_{p_0}=0.02413$) while the $p$ for the confidence interval will be based on the sample estimate (i.e. $\hat{p}=51/235 \approx 0.217$, giving $s_{\hat{p}}=0.02689$, yielding a wider interval) ... and as a result, while $51/235\pm Z_{\alpha/2} s_{p_0}$ wouldn't include $1/6$, we see that $51/235\pm Z_{\alpha/2} s_{\hat{p}}$ does include it. So if we used the standard error that the test used to conclude that the sample value was too far from the hypothesized value, the corresponding interval would also exclude the null value -- but the CI calculation doesn't have a null value to base that off.

Yet the usual test and the confidence interval both have (to a good approximation) the required properties. If you want to organize things so that the two do correspond, that should be possible to achieve (see below for an example that usually works), but tests and confidence intervals being the same isn't something you should automatically expect to be the case.

* Note, that if instead of the usual interval you consider the Wilson score interval (which gives an asymmetric interval), that interval would not include $1/6$. In effect, it corresponds to keeping $p$ in the standard error and solving a more complicated equation for the endpoints. At least in large samples it will generally be consistent with the test.

edited Jan 07 '17 at 09:25

answered Jan 07 '17 at 08:17

Glen_b

257,508
32
553
939

but about the Duality Theorem? Doesn't it state that CIs and p-values coincide w.r.t. the conclusions that we draw from them? – Lola Jan 07 '17 at 08:24
2

That would make a good question of its own (discussing it would likely require a fairly big extension of this answer). If you do ask it as a question of its own, please state the precise formulation of the theorem that you're relying on (a complete proof is not required, just the conditions needed/assumed for it to be true), so that the conditions of the theorem you have can be related to the circumstances here (i.e. so it can be shown why the theorem doesn't hold for the interval in your output, and hence won't hold for every possible interval). If you do ask it, please also link back here. – Glen_b Jan 07 '17 at 09:32
+1 but this thread seems to be very close to http://stats.stackexchange.com/questions/173005 -- I would vote to close as a duplicate if not for your answer. – amoeba Jan 07 '17 at 16:19
1

`but the CI calculation doesn't have a null value to base that off` -- this bit is not very clear. It seems that it would be perfectly possible to make CI consistent with the test: simply define the CI as all possible nulls not rejected by the test. The issue here is clearly that this particular CI is not computed in this manner, but it's not impossible to imagine such a CI. – amoeba Jan 07 '17 at 16:23
@amoeba That would be a consonance interval rather than a confidence interval. While in many simple situations defining your interval as "the set of parameter values not rejected by a hypothesis test" works just fine (and makes the whole issue of comparing intervals and tests a tautology), more broadly such an interval doesn't necessarily have the desired coverage -- or even anything close to it. \[here's a somewhat related link you might [find interesting](http://andrewgelman.com/2013/06/24/why-it-doesnt-make-sense-in-general-to-form-confidence-intervals-by-inverting-hypothesis-tests/)] – Glen_b Jan 07 '17 at 21:30
@amoeba on the duplicate -- a while after I posted that I had a vague recollection of a previous post on this topic (not that one though, so there's probably another duplicate to find), but didn't locate it in the time I had available then. I have no problem closing this as duplicate (and have done so). If you think there's additional value in the answer here, one option is to link the original post back here as well (via a comment under the question). – Glen_b Jan 07 '17 at 21:34
OK, I added a link there. About the rest: I don't think I understand. I thought that if a test has size (exactly) $\alpha$ then forming a CI by taking all points not rejected by this test will have coverage $1-\alpha$, almost tautologically. Is it wrong? I don't quite get Gelman's post either. He talks about a situation when a test can reject all possible values of $\theta$ (leading to a nonsensically empty CI), but this is not how tests usually work. Do you have a concrete example in mind to illustrate his point? – amoeba Jan 08 '17 at 00:03
On your sentence after "understand", it depends on what you mean by "not rejected by this test" -- this relates to Lola's followup question about the duality theorem (specifically why it doesn't apply in the situation she posts about in her original question), and why (for example) the Wilson score interval does effectively correspond to the test. I think you somewhat misunderstand the point of Gelman's example; the test is fine at the null, but not at the alternative (which is where you're calculating the CI from, and why they don't correspond)... ctd – Glen_b Jan 08 '17 at 00:18
ctd... if that doesn't get its own question I may have to post one myself (and then answer it if nobody else does). Briefly, the (nonetheless valid for large samples) CI procedure used in the question doesn't do what would be necessary to make it correspond. – Glen_b Jan 08 '17 at 00:20

Why is the P value from a binomial test sometimes not consistent with the confidence interval of the proportion?

1 Answers1

Linked