Questions tagged [false-discovery-rate]

An expected fraction of rejected null hypotheses that are falsely rejected, i.e. the fraction of significant findings that are actually not true. One method to control FDR in multiple testing is Benjamini-Hochberg procedure.

The False Discovery Rate (abbreviated as FDR) is an expected fraction of rejected null hypotheses that are falsely rejected, i.e., the fraction of "significant" findings that are actually not true. These are called "false discoveries". Given $V$ false discoveries and $R$ total rejected hypotheses, FDR can more formally be defined as

$$ FDR = E\left[\frac{V}{R}\right] $$

Controlling the false discovery rate has become a popular method for dealing with the multiple comparisons problem, and has seen wide acceptance in a variety of fields.

Benjamini and Hochberg were the first to introduce this method in 1995 [1]. Their method works as follows:

For a given $\alpha$, find the largest number of tests, $k$, out of all tests conducted, $m$, such that $P_k \leq \frac{k}{m}\alpha$ and subsequently reject all hypothesis $H_i$ for $i = 1 ... k$.

It was later shown by Benjamini and Yekutieli that the above mentioned method is robust to several dependency conditions, and specifically to a subset known as positive regression dependency. They also extended the method to include different kinds of dependency [2].

There have been several modifications and extensions to the FDR method proposed by Benjamini and Hochberg, including notably:

  1. The $q$-value extension by John D. Storey implemented in the qvalue R package available on Bioconductor and Github [3,4]. See also this web Shiny implementation of the qvalue R package [10].
  2. Local false discovery rates, implemented in the R package fdrtools on CRAN [5,6].
  3. Stratified FDR (sFDR) as implemented in Lei Sun's Perl script SFDR [7,8].

The original procedure (sometimes known as the BH procedure) is available as a default method in many software packages, and is an option in the p.adjust(p, method = "BH") function in R. The extended work of Benjamini, Hochberg, and Yekutieli is available through p.adjust(p, method = "BY") [9].

References and Further Reading

[1] Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statisitical Society, Series B, 57(1), 289–300.

[2] Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency, 1165–1188. http://doi.org/10.1214/aos/1013699998

[3] Storey, J. D. (1995). A direct approach to false discovery rates. J. R. Statist.Soc. B.

[4] https://github.com/jdstorey/qvalue

[5] Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96, 1151-1160.

[6] https://cran.r-project.org/web/packages/fdrtool/fdrtool.pdf

[7] Sun, L., Craiu, R. V., Paterson, A. D., & Bull, S. B. (2006). Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genetic Epidemiology, 30(6), 519–530. http://doi.org/10.1002/gepi.20164

[8] http://www.utstat.toronto.edu/sun/Software/SFDR/

[9] https://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html

[10] http://qvalue.princeton.edu/

254 questions
43
votes
2 answers

What are the practical differences between the Benjamini & Hochberg (1995) and the Benjamini & Yekutieli (2001) false discovery rate procedures?

My statistics program implements both the Benjamini & Hochberg (1995) and Benjamini & Yekutieli (2001) false discovery rate (FDR) procedures. I have done my best to read through the later paper, but it is quite mathematically dense and I am not…
russellpierce
  • 17,079
  • 16
  • 67
  • 98
39
votes
5 answers

The meaning of "positive dependency" as a condition to use the usual method for FDR control

Benjamini and Hochberg developed the first (and still most widely used, I think) method for controlling the false discovery rate (FDR). I want to start with a bunch of P values, each for a different comparison, and decide which ones are low enough…
32
votes
5 answers

How should an individual researcher think about the false discovery rate?

I've been trying to wrap my head around how the False Discovery Rate (FDR) should inform the conclusions of the individual researcher. For example, if your study is underpowered, should you discount your results even if they're significant at…
30
votes
2 answers

FPR (false positive rate) vs FDR (false discovery rate)

The following quote comes from the famous research paper Statistical significance for genome wide studies by Storey & Tibshirani (2003): For example, a false positive rate of 5% means that on average 5% of the truly null features in the study…
27
votes
3 answers

Why aren't multiple hypothesis corrections applied to all experiments since the dawn of time?

We know that we must apply Benjamini Hochberg-like corrections for multiple hypothesis testing to experiments based on a single data set, in order to control the false discovery rate, otherwise all experiments that give a positive result could be…
27
votes
4 answers

Do underpowered studies have increased likelihood of false positives?

This question has been has asked before here and here but I don't think the answers address the question directly. Do underpowered studies have increased likelihood of false positives? Some news articles make this assertion. For example: Low…
24
votes
1 answer

Plain language meaning of "dependent" and "independent" tests in the multiple comparisons literature?

In both the family-wise error rate (FWER) and false discovery rate (FDR) literature, particular methods of controlling FWER or FDR are said to be appropriate to dependent or independent tests. For example, in the 1979 paper "A Simple Sequentially…
22
votes
3 answers

Confusion with false discovery rate and multiple testing (on Colquhoun 2014)

I have read this great paper by David Colquhoun: An investigation of the false discovery rate and the misinterpretation of p-values (2014). In essence, he explains why false discovery rate (FDR) can be as high as $30\%$ even though we control for…
19
votes
1 answer

Why is controlling FDR less stringent than controlling FWER?

I have read that controlling FDR is less stringent than controlling FWER, such as in Wikipedia: FDR controlling procedures exert a less stringent control over false discovery compared to familywise error rate (FWER) procedures (such as the…
Tim
  • 1
  • 29
  • 102
  • 189
19
votes
2 answers

What's the formula for the Benjamini-Hochberg adjusted p-value?

I understand the procedure and what it controls. So what's the formula for the adjusted p-value in the BH procedure for multiple comparisons? Just now I realized the original BH didn't produce adjusted p-values, only adjusted the (non) rejection…
15
votes
1 answer

An intuitive explanation why the Benjamini-Hochberg FDR procedure works?

Is there a simple way of explaining why does Benjamini and Hochberg's (1995) procedure actually control the false discovery rate (FDR)? This procedure is so elegant and compact and yet the proof of why it works under independence (appearing in the…
Trisoloriansunscreen
  • 1,669
  • 12
  • 25
13
votes
1 answer

Power of FDR vs. FWER approaches in multiple comparisons

Regarding multiple comparisons, could someone please explain me why the power of false discovery rate (FDR) is greater than the power of family-wise error rate (FWER)?
11
votes
2 answers

Proof/derivation for false discovery rate in Benjamini-Hochberg procedure

The Benjamini-Hochberg procedure is a method that corrects for multiple comparisons and has a false discovery rate (FDR) equal to $\alpha$. Or is it the family wise error rate, FWER? I am a bit confused about this. According to my below computations…
11
votes
3 answers

How do FDR procedures estimate a False Discovery Rate without a model of base rates?

Can someone explain how FDR procedures are able to estimate an FDR without a model / assumption of the base rate of true positives?
user4733
  • 2,494
  • 2
  • 20
  • 31
10
votes
1 answer

Controlling False Discovery Rate in Stages

I have a three dimensional table of size $6\times6\times81$. Each cell of the table is a hypothesis test. Slicing the table on the third dimension produces $81$ sets of hypothesis tests which are independent between sets but dependent within sets.…
assumednormal
  • 3,724
  • 3
  • 19
  • 24
1
2 3
16 17