7

I'm looking to test for set enrichment and I'm wondering whether a fisher's exact test or hypergeometric test is more appropriate (and, if there isn't a straightforward answer, what the relative merits are).

To lay out an example problem, I have a set of 400 objects, 150 of which belong to class A. I draw 50 objects, and there is an overlap of 15. I could lay this out as a matrix (15, 50-15, 150-15, and 400 - 50 - 150 + 15) and use this matrix to calculate a fisher's exact test. Alternately, I could use the hypergeometric distribution in R and do phyper(15, 150, 400-150, 50).

Is one of these preferably to the other and, if so, why? Thanks!

Jautis
  • 588
  • 1
  • 4
  • 13
  • 3
    I don't understand the difference. Under the null hypothesis of no association and fixing the marginal totals Fisher's test statistic has a hypergeometric distribution. – Michael R. Chernick Jun 29 '17 at 22:27

2 Answers2

1

Hypergeometric test assesses the extremeness of observing x or more of "good" cases (overlap) and thus same as a one-sided Fisher's exact test (where the alternative hypothesis is "greater" in R jargon). If you do not care about the directionality then you can use two-sided Fisher's exact test.

emre
  • 111
  • 2
1

Just a comment to show the result of R commands, both approaches being the same.

> phyper(15, 150, 400-150, 50)
[1] 0.1549789
> fisher.test(matrix(c(15, 50-15, 150-15, 400 - 50 - 150 + 15), nr = 2),
              alternative = "less")$p.value
[1] 0.1549789
SamGG
  • 11
  • 1