Fisher's exact test gives non-uniform p-values

Question

I am trying to apply Fisher's exact test in a simulated genetics problem, but the p-values appear to be skewed to the right. Being a biologist, I guess I'm just missing something obvious to every statistician, so I would greatly appreciate your help.

My setup is this: (setup 1, marginals not fixed)
Two samples of 0s and 1s are randomly generated in R. Each sample n = 500, probabilities of sampling 0 and 1 are equal. I then compare the proportions of 0/1 in each sample with Fisher's exact test (just fisher.test; also tried other software with similar results). Sampling and testing is repeated 30 000 times. The resulting p-values are distributed like this: p-value distribution

Mean of all the p-values is around 0.55, 5th percentile at 0.0577. Even the distribution appears discontinuous on the right side.

I've been reading everything I can, but I don't find any indication that this behavior is normal - on the other hand, it's just simulated data, so I see no sources for any bias. Is there any adjustment I missed? Too small sample sizes? Or maybe it's not supposed to be uniformly distributed, and the p-values are interpreted differently?
Or should I just repeat this a million times, find the 0.05 quantile, and use that as the significance cutoff when I apply this to actual data?

Thanks!

Update:

Michael M suggested fixing the marginal values of 0 and 1. Now the p-values give a much nicer distribution - unfortunately, it's not uniform, nor of any other shape I recognize:

p-vals w fixed marginals

adding the actual R code: (setup 2, marginals fixed)

samples=c(rep(1,500),rep(2,500))
alleles=c(rep(0,500),rep(1,500))
p=NULL
for(i in 1:30000){
  alleles=sample(alleles)
  p[i]=fisher.test(samples,alleles)$p.value
}
hist(p,breaks=50,col="grey",xlab="p-values",main="")

Final edit:
As whuber points out in the comments, the areas just look distorted due to binning. I am attaching the QQ-plots for the setup 1 (free marginals) and setup 2 (fixed marginals). Similar plots are seen in Glen's simulations below, and all these results in fact seem rather uniform. Thanks for the help!

pval-qqplot

Try to repeat your simulation while holding not only the group sizes (500 each) but also the sum of "1" (over the pooled sample) constant. The p value of Fisher's exact test is derived under this "fixed marginal distribution" setting. Does the picture look better then? Btw. you cannot expect the p-value distribution to be exactly uniform by the discrete nature of the sampling distribution (i.e. the hypergeometric). — Michael M, Jul 21 '15 at 19:57
Thanks for the suggestion, although it didn't solve the problem. I'll update the question right away — juod, Jul 21 '15 at 20:30
Added the R code. I get similar results if I export the columns of randomized 0/1s and calculate Fisher's test in a genetic software called PLINK - I was thinking, maybe it's some property of the test itself?.. — juod, Jul 21 '15 at 21:48
You should set the seed for reproducibility. I used `set.seed(6069)`. I get a histogram with the same profile as yours, but I notice that `mean(p<.05 do="" i="" later.="" more=""> — gung - Reinstate Monica, Jul 21 '15 at 22:10
The p-value is uniformly distributed if the null hypothesis is true. That's probably not the case for a majority of the tests. — Glen, Jul 21 '15 at 22:34
@Glen it seems to me from the code that in each iteration both samples have the same number of 0s and 1s (i.e. null hypothesis should hold) or am I wrong? — bdeonovic, Jul 21 '15 at 22:55
@Glen, bdeonovic: Let's say sample 1 has some number A of 1s, and sample 2 has some number B of 1s. A and B are not equal here - if it were so, every time the test statistic would be the same - but A+B stays constant. In other words, it's always 500 zeros and 500 ones, but sometimes they are divided equally between the samples, and sometimes they are not. Thinking this way, _"no difference between samples"_ is probably the most frequent case - but it should be reflected in the test statistic, not the p-value... Am I understanding this correctly? — juod, Jul 21 '15 at 23:08
@juod I see, then Glen's comment stands to explain the phenomenon: p-values are uniformly distributed under the null hypothesis, which does not hold in each of your simulation iterations. — bdeonovic, Jul 21 '15 at 23:22
These histograms look remarkably uniform to me. You have to remember that histograms display probability (or frequency) by means of *area*. The increasing gaps to the right (due to the unavoidable discreteness of the p-value distribution of any nonrandomized test of discrete data) cause the bar heights to increase, *but their areas seem to be almost constant.* Instead of using a histogram to assess uniformity, graph the empirical CDF. — whuber, Jul 21 '15 at 23:45
Aside from the specific distribution, this question appears to be completely answered [here](http://stats.stackexchange.com/questions/153249/distribution-of-p-values-binomial-test/) — Glen_b, Jul 22 '15 at 09:43

Glen · Accepted Answer · 2015-07-22T15:30:07.720

10

The problem is the data are discrete so histograms can be deceiving. I coded a simulation with qqplots that show an approximate uniform distribution.

library(lattice)
set.seed(5545)
TotalNo=300
TotalYes=450

pvalueChi=rep(NA,10000)
pvalueFish=rep(NA,10000)

for(i in 1:10000){
  MaleAndNo=rbinom(1,TotalNo,.3)
  FemaleAndNo=TotalNo-MaleAndNo
  MaleAndYes=rbinom(1,TotalYes,.3)
  FemaleAndYes=TotalYes-MaleAndYes
  x=matrix(c(MaleAndNo,FemaleAndNo,MaleAndYes,FemaleAndYes),nrow=2,ncol=2)
  pvalueChi[i]=chisq.test(x)$p.value
  pvalueFish[i]=fisher.test(x)$p.value
}

dat=data.frame(pvalue=c(pvalueChi,pvalueFish),type=rep(c('Chi-Squared','Fishers'),each=10000))
histogram(~pvalue|type,data=dat,breaks=10)
qqmath(~pvalue|type,data=dat,distribution=qunif,
       panel = function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       })

enter image description here

edited Jul 22 '15 at 15:30

answered Jul 21 '15 at 23:35

Glen

6,320
4
37
59

5

Such spikes and dips in histograms or bar charts of *discrete* data are often artifacts of the binning procedure. Don't trust them: use more discerning plots of the distributions, such as QQ plots or graphs of the ECDFs. Even if they are real, nobody will care provided the distributions of p-values are approximately uniform and of the right density where it matters for decision making: in the interval close to zero (and certainly less than 0.5). – whuber Jul 21 '15 at 23:47
Excellent point @whuber, I'll update with qqplots. – Glen Jul 22 '15 at 00:02
2

@whuber, Glen, thanks so much! In fact the binning was deceptive, as simply splitting Glen's histograms into more breaks gave a similar pattern to mine. And I also get linear empirical CDF/QQ with my simulations, so the problem seems to be solved. – juod Jul 22 '15 at 00:10
@juod: it would be highly appreciated if you could add the qqplot for illustration, maybe even for both simulations? – Michael M Jul 22 '15 at 05:38
The qq plots really help--thank you. Don't you want to change the first paragraph of your answer, though? Do you still maintain there is a problem with the simulation and that there is a "spike" in the p-value distribution? – whuber Jul 22 '15 at 15:14

Fisher's exact test gives non-uniform p-values

1 Answers1