11

Suppose, I have done:

  • $n_1$ independent trials with an unknown success rate $p_1$ and observed $k_1$ successes.
  • $n_2$ independent trials with an unknown success rate $p_2$ and observed $k_2$ successes.

If, now $p_1 = p_2 =: p$ but still unknown, the probability $p(k_2)$ to observe $k_2$ for a given $k_1$ (or vice versa) is proportional to $\int_0^1 B(n_1,p,k_1) B(n_2, p, k_2) \text{d}p = \frac{1}{n_1+n_2+1}\binom{n_1}{k_1}\binom{n_2}{k_2}\binom{n_1+n_2}{k_1+k_2}^{-1}$, so if I want to test for $p_1 \neq p_2$, I only need to look in which quantile of the corresponding distribution my observations are.

So far for reinventing the wheel. Now my problem is that I fail to find this in literature, and thus I wish to know: What is the technical term for this test or something similar?

Wrzlprmft
  • 2,225
  • 1
  • 18
  • 35
  • 2
    Why not use the two-proportion z-test (http://en.wikipedia.org/wiki/Statistical_hypothesis_testing) (If I understand your problem correctly). – Verena Dec 19 '13 at 15:04
  • @ExpectoPatronum: At a quick glance the biggest problem is that this test requires at least 5 successes and failures for each observation, which may not be given in my application and also indicates that (unneccessary) approximations are made. – Wrzlprmft Dec 19 '13 at 16:20
  • ok, that is a problem but most tests have similar requirements. – Verena Dec 19 '13 at 16:26
  • @ExpectoPatronum: Anyway searching for an exact alternative to the two-proportion z-test, I found Fisher’s exact test, which looks very similar at first glance (but I have yet to look into it in detail). – Wrzlprmft Dec 19 '13 at 16:31
  • Actually you're right, your formula is very similar to Fisher's exact test: http://en.wikipedia.org/wiki/Fisher's_exact_test#Example The only difference is that you divide by (n1+n2+1), which makes your result a lot more significant (more often you will not reject). – Verena Dec 19 '13 at 21:21
  • 1
    @ExpectoPatronum: The division does not matter, since the large term is only proportional to $p(k_2)$ and $(n_1+n_2+1)$ is exactly the normalisation constant. Anyway, I have now confirmed that this is Fisher’s Exact Test, which I found thanks to you. – Wrzlprmft Dec 19 '13 at 21:41
  • nice, I didn't know about this normalization constant. – Verena Dec 20 '13 at 05:08

2 Answers2

7

The test statistics $p(k_2)$ is that of Fisher’s Exact Test.

Since $$\sum_{k_2}^{n_2} \frac{1}{n_1+n_2+1}\binom{n_1}{k_1}\binom{n_2}{k_2}\binom{n_1+n_2}{k_1+k_2}^{-1} = \frac{1}{n_1+n_2+1},$$ normalisation can be obtained by multiplying with $n_1+n_2+1$ and thus: $$p(k_2) = \binom{n_1}{k_1}\binom{n_2}{k_2}\binom{n_1+n_2}{k_1+k_2}^{-1}.$$

Wrzlprmft
  • 2,225
  • 1
  • 18
  • 35
0

I had a similar idea independently and explored it a little further. The result you provide is related to Fisher's exact test, but Fisher's test is conditional. The classic example for this is the lady tasting tea experiment. Eight cups of tea are prepared; in four of them the milk is added first and in the other four the tea is added first. The lady must guess in which cups the milk was added first. The key here is that, whatever her answer, she will choose exactly four cups.

It turns out that the idea of integrating the binomial distribution can be easily extended to an unconditional test (tentatively called m-test). This means that $p\left(k_2\right)$ is compared to every possible result with $n_1$ and $n_2$ independent trials, with any allowed value for $k_1$ and $k_2$. The m-test can be extended to more experiments and more outcomes (not only success and failure). It is relatively easy to test a one-sided hypothesis ($p$ value if $p_1 > p_2$). In case you find it interesting, I uploaded the details to arxiv here, and an R package to apply the test here. My collaborator and I found that the m-test seems to be more powerful than Fisher's exact test and Barnard's test when $n_1$ and $n_2$ are low.

vqf
  • 111
  • 3