15

I wanted to understand fisher exact test better, so I devised up the following toy example, where f and m corresponds to male and female, and n and y corresponds to "soda consumption" like this:

> soda_gender

    f m
  n 0 5
  y 5 0

Obviously, this is a drastic simplification, but I didn't want the context to get in the way. Here I just assumed that males don't drink soda and females drink soda, and wanted to see if the statistical procedures come to the same conclusion.

When I run the fisher exact test in R, I get the following results:

> fisher.test(soda_gender)
Fisher's Exact Test for Count Data

data:  soda_gender
p-value = 0.007937
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.0000000 0.4353226
sample estimates:
odds ratio 
         0 

Here, since p-value is 0.007937, we would conclude that gender and soda consumption are associated.

I know that fisher-exact test is related to hypergeomteric distribution. So I wanted to get the similar results using that. In other words, you can view this problems as following : there are 10 balls, where 5 are labeled as "male", and 5 are labeled as "female", and you draw 5 balls randomly without replacement, and you see 0 male balls. What is the chance of this observation? To answer this question, I used the following command:

> phyper(q=0,m=5,n=5,k=5,lower.tail=TRUE)
[1] 0.003968254

My questions are: 1) How come the two results are different? 2) Is there anything incorrect or not rigorous in my reasoning above?

AdamO
  • 52,330
  • 5
  • 104
  • 209
Alby
  • 2,103
  • 3
  • 19
  • 22

1 Answers1

11

Fisher's exact test works by conditioning upon the table margins (in this case, 5 males and females and 5 soda drinkers and non-drinkers). Under the assumptions of the null hypothesis, the cell probabilities for observing a male soda drinker, male non-soda drinker, female soda drinker, or female non-soda drinker are all equally likely (0.25) because of the margin totals.

The particular table you used for the FET has no table aside from its converse, 5 female non-soda drinkers and 5 male soda drinkers, which is "at least as unlikely" under the null hypothesis. So you'll notice that doubling the probability you obtained in your hypergeometric density gives you the FET p-value.

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • Meng's notes on phyper and fisher.test (which do the same thing, but have a very different interface) are very helpful: http://mengnote.blogspot.qa/2012/12/calculate-correct-hypergeometric-p.html – Aditya Apr 14 '16 at 05:07