I wanted to understand fisher exact test better, so I devised up the following toy example, where f and m corresponds to male and female, and n and y corresponds to "soda consumption" like this:
> soda_gender
f m
n 0 5
y 5 0
Obviously, this is a drastic simplification, but I didn't want the context to get in the way. Here I just assumed that males don't drink soda and females drink soda, and wanted to see if the statistical procedures come to the same conclusion.
When I run the fisher exact test in R, I get the following results:
> fisher.test(soda_gender)
Fisher's Exact Test for Count Data
data: soda_gender
p-value = 0.007937
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.0000000 0.4353226
sample estimates:
odds ratio
0
Here, since p-value is 0.007937, we would conclude that gender and soda consumption are associated.
I know that fisher-exact test is related to hypergeomteric distribution. So I wanted to get the similar results using that. In other words, you can view this problems as following : there are 10 balls, where 5 are labeled as "male", and 5 are labeled as "female", and you draw 5 balls randomly without replacement, and you see 0 male balls. What is the chance of this observation? To answer this question, I used the following command:
> phyper(q=0,m=5,n=5,k=5,lower.tail=TRUE)
[1] 0.003968254
My questions are: 1) How come the two results are different? 2) Is there anything incorrect or not rigorous in my reasoning above?