11

In my work I have seen several uses of Fisher's exact test, and I was wondering how well it fits my data. Looking at several sources I understood how to calculate the statistic, but never saw a clear and formal explanation of the assumed null hypothesis.

Can someone please explain or refer me to a formal explanation of the assumed distribution? Will be grateful for an explanation in terms of the values in the contingency table.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
Amit Lavon
  • 113
  • 1
  • 5

2 Answers2

11

In the $2\times 2$ case the distributional assumption is given by two independent binomial random variables $X_1 \sim Bin(n_1, \theta_1)$ and $X_2 \sim Bin(n_2, \theta_2)$. The null hypothesis is the equality $\theta_1=\theta_2$. But Fisher's exact test is a conditional test: it relies on the conditional distribution of $X_1$ given $X_1+X_2$. This distribution is a hypergeometric distribution with one unknown parameter: the odds ratio $\psi=\frac{\frac{\theta_1}{1-\theta_1}}{\frac{\theta_2}{1-\theta_2}}$, and then the null hypothesis is $\psi=1$.

This distribution has its Wikipedia page.

To evaluate it with R, you can simply use the formula defining the conditional probability:

p1 <- 7/27
p2 <- 14/70
x1 <- 7; n1 <- 27
x2 <- 14; n2 <- 56
# 
m <- x1+x2
dbinom(x1, n1, p1)*dbinom(x2, n2, p2)/sum(dbinom(0:m, n1, p1)*dbinom(m-(0:m), n2, p2))
[1] 0.1818838

Or use the dnoncenhypergeom function of the MCMCpack package:

psi <- p1/(1-p1)/(p2/(1-p2)) # this is the odds ratio
MCMCpack::dnoncenhypergeom(x=x1, n1, n2, x1+x2, psi)
[1] 0.1818838
Stéphane Laurent
  • 17,425
  • 5
  • 59
  • 101
10

Fisher's so-called "exact" test makes the same kind of subtle assumptions that $\chi^2$ tests make.

  • The two variables being assessed for association are truly polytomous all-or-nothing variables such as dead/alive US/Europe. If one or both of the variables is a simplification of an underlying continuum, categorical data analysis should not be undertaken at all.
  • There are no other relevant background variables. If $Y$ is the outcome variable and $X$ is a variable being assessed for association with $Y$, the probability that $Y=y$ is identical for every subject with $X$ fixed at $x$. Contingency tables assume in effect that there is no heterogeneity in the distribution of $Y$ that is not accounted for by $X$. For example, in a randomized clinical trial studying the effect of treatment A vs. B on the probability of death, a $2\times 2$ contengency table test assumes that every subject on treatment A has the same probability of death. [One could argue that this is too stringent an assumption, but that position doesn't recognize the loss of power from doing unadjusted tests of association.]

Fisher's test makes one assumption not made by unconditional tests of association such as Pearson's $\chi^2$ test: that we are interested in the "current" marginal distribution of both $X$ and $Y$, that is, we are conditioning on the frequencies of the $Y$ outcome categories. This is not reasonable for prospective studies. The use of Fisher's test leads to conservatism. Its $P$-values are on the average too large, because the test guarantees that the $P$-values are not too small. On the average, Pearson $\chi^2$ $P$-values are more acccurate than Fisher's, even with expected frequencies far lower than 5 in some of the cells.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • Thank you @FrankHarrell. Can you give references for your claim about chi-square P-values being more accurate than Fisher's? – Amit Lavon Dec 30 '14 at 13:38
  • 1
    See for example http://www.citeulike.org/user/harrelfe/tag/fishers-exact-test. This has been discussed at length on stackexchange. – Frank Harrell Dec 30 '14 at 14:21
  • sadly ctiteulike is gone and web.archive.org only seems to have crawled the first page of the harrelfe account. – Glen_b Mar 17 '20 at 01:31
  • https://www.zotero.org/groups/2199991/feh/tags/fishers-exact-test/library – Frank Harrell Mar 17 '20 at 19:48