0

Let's say I have a $2\times2$ table where all of the expected values are at least one, and no more than 20 percent of the counts are less than 5 (hence, Cochran's rules are not violated). In such a situation is there a difference between using the Fisher's exact test or the chi-squared test to determine independence or homogeneity?

 chisq.test(data.frame(x=c(20,26), y=c(14,40)))
fisher.test(data.frame(x=c(20,26), y=c(14,40)))

I get a p-value of 0.09 and 0.10 for the chi-squared test and Fisher's exact test respectively.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Nathgun
  • 59
  • 10
  • 1
    You get different p-values, so there must be a difference. Can you clarify what your question is? – gung - Reinstate Monica Nov 20 '17 at 19:33
  • My question was, in a situation where a 2*2 table doesn't violate Cochran's rules is it better to use one test to the other if we are testing for independence? – Nathgun Nov 20 '17 at 20:07
  • 1
    If the marginals were fixed in advance, you should use Fisher's exact test (cf, [Dataset for studying and teaching Fisher's exact test](https://stats.stackexchange.com/q/248001/7290)). In general, the chi-squared test will have more power, as you found in your case (cf, [Given the power of computers these days, is there ever a reason to do a chi-squared test rather than Fisher's exact test?](https://stats.stackexchange.com/q/14226/)). – gung - Reinstate Monica Nov 20 '17 at 20:57

1 Answers1

4

Fisher's test conditions on the marginal totals being fixed at the observed values. The chi square test does not. Under the assumption of fixed marginals Fisher's test is exact. The chi square test is still an asymptotic approximation even under your case where the Cochran rules hold.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • I understand that marginal totals are fixed in Fisher's exact test, and we are looking at tables that are as extreme or more extreme than the given table. But how come marginal totals are not fixed in the chi-squared situation where we get the expected table from (row total of observed*column total of observed)/grand total observed. I thought these observed row and column totals are fixed as well – Nathgun Nov 20 '17 at 19:57
  • 1
    With Fisher's test the multinomial distribution depends on the marginal totals. Summing (obs-exp)$^2$/exp does not lead to an asymptotic distribution that depends on the marginal totals. – Michael R. Chernick Nov 20 '17 at 20:04
  • 1
    I meant hypergeometric distribution in the case of Fisher's test. – Michael R. Chernick Nov 20 '17 at 20:13
  • Some people do argue for conditioning on the margins even when your margins are not fixed. (That's a debate that has gone on for what must be more than 80 years now.) – Glen_b Nov 20 '17 at 22:54
  • @Glen_b I understand that for at least 80 years there has been controversy about conditioning on the fixed marginals with Fisher's exact test but what do you mean by conditioning on marginals that are not fixed? – Michael R. Chernick Nov 20 '17 at 23:00
  • There's really no controversy over conditioning on what is already fixed. The controversy comes when conditioning when they aren't fixed (when conditioning would make a difference). The margins are nearly ancillary which is in large part why many people have argued for conditioning on them. Yates (1984) "[Tests of Significance for 2 × 2 Contingency Tables](https://www.jstor.org/stable/2981577?seq=1#page_scan_tab_contents)", JRSS-A 147:3 426-463 gives a pretty good summary of the history up to that point and gives some of the arguments (and changes of heart) that went on up to that point. – Glen_b Nov 20 '17 at 23:17
  • I understand all that but given the data for a contingency table you have known marginal totals and Fisher's test looks at all such tests with the same marginals as the data. Doesn't that mean the marginals are fixed? How does one condition on marginals that are not fixed. – Michael R. Chernick Nov 20 '17 at 23:34