0

I have a 2x2 contingency table with the following values: $$A = 20, B = 10, C = 200, D = 300$$ As you can see, the sample size is much larger than $20$, the recommended sample size if one were to use the Fisher's exact test. I read this post that suggested to do a simple $\frac{N-1}{N}$. However, I don't quite understand why or how to do this.

Hence, is there a more general way(for large sample sizes) to compute this hyper geometric distribution instead of just the fisher test?

I considered a chi-squared test, but I'm in need of something more conservative.

Lastly, as a side note, I saw this docs page for fisher test by python. However, I'm not able to find much documentation on whether this will work well for large sample sizes. Not to mention, it has a note saying it uses conditional maximum likelihood where as R uses unconditional. Some clues on which version is better and what their respective mathematical equations would look like would also be nice to know.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
Christian
  • 1,382
  • 3
  • 16
  • 27
  • 1
    Why do you think chi-squared test is too liberal? All expected values seem to be large enough to me? Fisher's exact test is still valid, but it becomes numerically harder for such a large sample sizes. – Knarpie Sep 02 '17 at 12:11
  • 1
    If you have large N and large expected frequencies in all cells (as you do here) then why do you want Fisher's exact test at all? – Peter Flom Sep 02 '17 at 12:34
  • 1
    Running your example on my machine using R and the fisher.test command is virtually instantaneous. – mdewey Sep 02 '17 at 12:48
  • I have to write software that reads data and does statistical analysis using python, hence, my question regarding the conditional/unconditional maximum likelihood. Between the answers provided by R and Python, what should I use for different situations? @mdewey – Christian Sep 02 '17 at 18:44
  • From my understanding, the fisher exact test is more accurate. However, I guess if my data is so large, than chi-square might offer accurate values anyway... hmm. @PeterFlom – Christian Sep 02 '17 at 18:45
  • Actually, after making the post, I researched around and I'm actually more convinced that I should be using a chi-square test. Why would a fisher's exact test still be valid if more than 80% of the expected values are greater than 5? Or is that not a hard set rule? – Christian Sep 02 '17 at 18:53
  • My professor suggested to me to use a hypergeometric test for our data. However, I wasn't sure how to amend that test for a 2x2 contigency table other than using a Fisher exact test, especially since it uses hypergeometric distribution. But i'm open to new suggestions, given the right reasoning so I can convince my professor which test to use over the one he suggested. @PeterFlom – Christian Sep 02 '17 at 18:56
  • 1
    There is no reason NOT to use Fisher's exact test, except computation time. No adjustment should be needed. – Peter Flom Sep 02 '17 at 19:10
  • So what about the rule regarding no more than 80% of expected values being greater than 5? @PeterFlom – Christian Sep 02 '17 at 19:22
  • 1
    That is a reason TO use it, not a reason NOT to do so. Once all the expected frequencies are large, you don't HAVE to use Fisher's, but you still can. – Peter Flom Sep 02 '17 at 19:43
  • 1
    @Christian The 'rule' you're talking about is a rough rule of thumb relating to when the asymptotic chi-squared approximation to the null distribution of the chi-squared test statistic may be poor (the rule says not to use the ordinary chi-square when too many of the expected values are below 5). In that case sometimes people choose the Fisher Exact test, but higher expected counts are not an inherent reason to avoid Fisher's Exact test. There's nothing to "modify"; the test does the same thing either way. – Glen_b Sep 02 '17 at 22:27
  • 1
    If you feel like that mostly answers your question I guess I could post it as an answer. (We have a lot of questions relating to this test which you should probably also check out; at least some of these briefly mention conditional vs unconditional calculations). Note that for the 2x2 case the Fisher test is based on the ordinary hypergeometric distribution. – Glen_b Sep 02 '17 at 23:26
  • Yes, the collective information from these comments definitely helps and serves as a great source of clarification! Please post as an answer so I can accept! – Christian Sep 03 '17 at 01:30

0 Answers0