What's the formula for the Benjamini-Hochberg adjusted p-value?

Question

I understand the procedure and what it controls. So what's the formula for the adjusted p-value in the BH procedure for multiple comparisons?

Just now I realized the original BH didn't produce adjusted p-values, only adjusted the (non) rejection condition: https://www.jstor.org/stable/2346101. Gordon Smyth introduced adjusted BH p-values in 2002 anyways, so the question still applies. It's implemented in R as p.adjust with method BH.

amoeba · Accepted Answer · 2019-10-16T09:05:10.510

The famous seminal Benjamini & Hochberg (1995) paper described the procedure for accepting/rejecting hypotheses based on adjusting the alpha levels. This procedure has a straightforward equivalent reformulation in terms of adjusted $p$-values, but it was not discussed in the original paper. According to Gordon Smyth, he introduced adjusted $p$-values in 2002 when implementing p.adjust in R. Unfortunately, there is no corresponding citation, so it has always been unclear to me what one should cite if one uses BH-adjusted $p$-values.

Turns out, the procedure is described in the Benjamini, Heller, Yekutieli (2009):

An alternative way of presenting the results of this procedure is by presenting the adjusted $p$-values. The BH-adjusted $p$-values are defined as $$p^\mathrm{BH}_{(i)} = \min\Big\{\min_{j\ge i}\big\{\frac{mp_{(j)}}{j}\big\},1\Big\}.$$

This formula looks more complicated than it really is. It says:

First, order all $p$-values from small to large. Then multiply each $p$-value by the total number of tests $m$ and divide by its rank order.
Second, make sure that the resulting sequence is non-decreasing: if it ever starts decreasing, make the preceding $p$-value equal to the subsequent (repeatedly, until the whole sequence becomes non-decreasing).
If any $p$-value ends up larger than 1, make it equal to 1.

This is a straightforward reformulation of the original BH procedure from 1995. There might exist an earlier paper that explicitly introduced the concept of BH-adjusted $p$-values, but I am not aware of any.

Update. @Zenit found that Yekutieli & Benjamini (1999) described the same thing already back in 1999:

That's the answer I was expecting, +1. I remember reading about Gordon Smyth implementation of the adjusted _p_ value as well and not knowing who to cite, cool to see there's a "canon" citation to this. — Firebug, Apr 10 '19 at 17:52
I believe an even earlier reference exists: [Yekutieli and Benjamini (1999)](https://doi.org/10.1016/S0378-3758(99)00041-5) (pdf version available [here](http://thom.jouve.free.fr/work/thesis/sitecopy_save/Biblio/Current/Yekutieli1999.pdf)). Definition 2.4 describes how the original 1995 FDR procedure can be rephrased in terms of adjusted p-values. Credit to [this blog post](https://brainder.org/2011/09/05/fdr-corrected-fdr-adjusted-p-values/) where I found about this. — Zenit, Oct 15 '19 at 20:46
Thanks for the source @Zenit ! It's mildly weird how such a ubiquitous statistical method does not have a well known reference. — Firebug, Oct 15 '19 at 22:11

Aditya · Answer 2 · 2018-06-05T03:16:57.937

First a to the point answer. Consider that $p_0$ is the (single test) $p$ value associated with value $z_0$ of the test statistic. The Benjamini-Hochberg FDR is computed in two steps ($N_0$ = # pvalues $\le$ $p_0$, $N$ = # pvalues):

$\text{FDR }(p_0) = \frac{\quad p_0 \quad }{\frac{N_0}{N}}$
$\text{FDR }(p_i) = \min (\text{FDR}(p_i), \text{FDR}(p_{i+1}))$

Now let's understand this. The (Bayesian) underlying idea is that observations come from a mixture of two distributions:

$\pi_0 \: N$ observations from the null density $f_0(z)$
$(1-\pi_0) \: N$ observations from alternative density $f_1(z)$.

What is observed is the mixture of those two:

$f(z) = \pi_0 \cdot f_0(z) + (1-\pi_0) \cdot f_1(z)$

The (Bayesian) definitions are:

$\text{Fdr} = \frac{\pi_0 \: (1-F_0(z_0))}{(1-F(z))}$ (a fraction of the tail areas)
$\text{fdr} = \frac{\pi_0 \: f_0(z_0)}{f(z)}$ (a fraction of the tail densities)

As shown below, Fdr is equivalent to the Benjamini hocherg FDR when $\pi_0 \approx 1$ (which is the case in most bioinformatics studies)

(Based on Efron & Tibshirani's Computer Age Statistical Inference)

What's the formula for the Benjamini-Hochberg adjusted p-value?

2 Answers2

Linked

Related