2

I'm testing if three independent experiments have the same probability of success ($H_0: p_1 = p_2 = p_3$). For $f_1 = f_2 = f_3 = 0$ and $n_1 = n_2 = n_3 = 10$ (zero successes out of 10 trials in all experiments), the R function prop.test returns a NaN for the p-value:

prop.test(c(0,0,0), c(10,10,10))$p.value
## [1] NaN
## Warning message:
## In prop.test(c(0, 0, 0), c(10, 10, 10)) :
##   Chi-squared approximation may be incorrect

If I replace one of the zeros by a very small value (e.g., $10^{-100}$), the p-value is 1 as expected. The same occurs with $f_1 = f_2 = f_3 = 10$, but disappears when e.g. $f_3 = 10 - 10^{-10}$. (I'm aware that the values should be integers.)

Is this a glitch in R's implementation, or is the p-value undefined for some reason in the cases described above?

krlmlr
  • 749
  • 1
  • 8
  • 35
  • 2
    Do you see *any* evidence in these data that the experiments have different success probabilities?? If you still think you need to perform a test at all, then the help for `prop.test` directs you to use `binom.test`. – whuber Aug 14 '15 at 15:04
  • @whuber: Thanks. I'm trying to test that three stochastic algorithms behave identically; the behavior is measured by interpreting the output as Bernoulli experiments with fixed but unknown success probabilities. I'm repeating the experiment many times, and I'm looking at the distribution of the resulting p-values. (In other words, I'm trying to provide strong evidence **for** the null hypothesis.) The inputs shown above can (and do) occur, especially if the true success probability is small or large. Does this make more sense now? – krlmlr Aug 14 '15 at 15:24
  • @whuber: I find `prop.test` more convenient for my case, because I can simultaneously plug in the results of all stochastic algorithms under test. (I have more than three algorithms, in fact.) – krlmlr Aug 14 '15 at 15:25
  • Unfortunately, as you have found, `prop.test` is inapplicable in your case. That's why it issues a warning. You ought to heed it. – whuber Aug 14 '15 at 17:08
  • http://stats.stackexchange.com/questions/56300/which-test-to-use-to-compare-proportions-between-3-groups suggests a G test (although that includes $\log(o/e)$, which also requires an appropriate limit to be taken) – Ben Bolker Aug 14 '15 at 17:15
  • @whuber: Thanks a lot for your input. I'll try another test. – krlmlr Aug 14 '15 at 19:58
  • @BenBolker: I'll try [`Deducer::likelihood.test`](https://rforge.net/doc/packages/Deducer/likelihood.test.html), unless you can suggest something more suitable. – krlmlr Aug 14 '15 at 20:02
  • @whuber: Now I also see the effect if I ignore the warning: The computed p-values are wrong, and not distributed uniformly anymore under the null hypothesis (as they should be). Thanks again. – krlmlr Aug 19 '15 at 08:21
  • 1
    The p-values *cannot* have a uniform distribution, because only a (small) finite number of p-values is possible. If you really want a uniform distribution then you need to use a randomized test (which few people actually will do). The real problem is that the p-values are incorrect when based on a $\chi^2$ distribution: they only *asymptotically* have that distribution. The asymptotics assume all the individual cell populations grow arbitrarily large but your case is just the opposite. The p-values computed using an exact test will be correct. – whuber Aug 19 '15 at 12:11
  • @whuber: I'm using a fairly large number of trials ($10^4$), and the distribution of the p-values *appears* uniform, though the set of possible p-values is finite. (The ECDF goes well along the straight line, and the K-S test cannot reject uniformity.) Would you please take a look at the follow-up question that addresses this issue in more detail: http://stats.stackexchange.com/q/168012/6432 – krlmlr Aug 20 '15 at 08:43
  • I'm confused now, because in a previous comment you asserted "The computed p-values are ... not distributed uniformly anymore under the null hypothesis" and now you write "the distribution of the p-values appears uniform" (and you give good reasons to believe that). Another apparent contradiction is that the question still refers to *ten* trials, not $10^4$. I am at a loss even to know what you are trying to ask. – whuber Aug 20 '15 at 14:06
  • @whuber: Thanks for getting back. If I use `prop.test` for testing equality of proportions in an experiment with very low or very high success probability, the p-values are not uniform under $H_0$, presumably because they are now an approximation. I used 10 trials in the question to simplify the example, in reality it's $10^4$ trials. -- I think the question at hand is solved, my [new question](http://stats.stackexchange.com/q/168012/6432) has been closed with a reference to Fisher's exact test -- but that doesn't seem to work for me either, I'm going to ask why in yet another question. – krlmlr Aug 20 '15 at 14:32
  • There are problems with such simplification, as I hope you can see: changing $10$ to $10^4$ substantially changes the nature of your question and can change what people recommend in their answers. – whuber Aug 20 '15 at 14:37

1 Answers1

2

prop.test computes a chi-squared statistic, i.e. $\sum (\textrm{obs}_i-\textrm{expect}_i)^2/\textrm{expect_i}$, which is derived from a Poisson model for the cell counts. Since the expected cell counts in your case are zero, you're out of luck -- it is true that the limit happens to be what you want in this case (as you discovered).

The actual code in prop.test():

STATISTIC <- sum((abs(x - E) - YATES)^2/E)

(YATES is a continuity correction term).

It's true that prop.test() doesn't say much explicitly about what it's doing: the only real clue is that the summary statistic is labeled as X-squared in the output.

And in any case, as @whuber says, you're getting a warning that this test is not reliable (which pops up when any of the expected cell counts are < 5). If you want to keep using it (at your own risk), just hack around it as you suggest in your question.

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
  • 1
    Thanks. I'm just wondering -- the statistic happens to be `NaN`, but from my very limited understanding I feel that the p-value for the *test* itself should be 1 (by definition of p-value as the likelihood that the data is as least as "extreme" as the observed data). In this case, perhaps the R function should consider this (very special) corner case? But perhaps not -- nobody seems to care about corner cases ;-) – krlmlr Aug 14 '15 at 20:11
  • If so, ask about --- and propose a change --- on R-help – kjetil b halvorsen Aug 14 '15 at 21:11
  • actually R-devel, I think – Ben Bolker Aug 14 '15 at 21:32