4

One post says that we should check the distribution of p-value before apply FDR correction. If the p-value distribution doesn't behavior well(e.g. U shape, not uniformly distributed at the tail toward 1), there might be a problem of your data or your model assumption.

However, I'm confused by another post, saying that 'The FDR does not assume a uniform distribution of p-values'.

What's one should I follow? Can I use BH-FDR if the p-value is not uniformly distributed?

Thanks in advance!

  • Your first link says that uniform p-values are a bad thing. Your second link says FDR does not require uniform p-values, which makes sense because a uniform distribution implies there are no effects to find). Where is the conflict? – John Apr 04 '16 at 23:34
  • The conflict is that whether I should do something else to correct the U-shape p-value distribution. – user3915365 Apr 04 '16 at 23:53
  • And that has nothing to do with a uniform p-distribution assumption which is what your question is currently focused on. I suggest you either extensively edit this question or delete it and open a new one that asks what you want to ask. Either way also say what you've done with respect to the recommendations in the first link for scenario C. Don't confuse the requirement that the small number of hypotheses close to 1 be uniform with a uniform distribution. – John Apr 04 '16 at 23:57
  • When you were looking at a distribution of p-values, how did you know the null was true? – Glen_b Apr 05 '16 at 00:11
  • @Glen_b We don't know whether null is true. But we expect the p-value distribution would follow the scenario A or B in the first post. – user3915365 Apr 05 '16 at 23:32
  • @John I tried to filter some tests (e.g. low read coverage in ChIP-seq), but the distribution is the U shape or even J shape (scenario D) in the first post. – user3915365 Apr 05 '16 at 23:35
  • Ah, it's okay, I took something you wrote to mean something different from your intention; I understand what you mean now (and will delete the corresponding comments; they'll only help me I suspect). Can you please edit more context (in particular from the `varianceexplained` post) into your question, which would make it clearer that you're discussing p-values under mixtures of cases with $H_0$ true and false, and potentially where other assumptions about the situation might be mistaken? Questions should be able to stand on their own, even if their outside links die. A summary/quotes may help. – Glen_b Apr 05 '16 at 23:52
  • Note in particular that the varianceexplained post says "... ***some*** *kinds of FDR control are based on the assumption that your p-values near 1 are uniform*" (emphasis mine). So you may not have an actual disagreement between those two. – Glen_b Apr 05 '16 at 23:58

1 Answers1

3

Benjamini Hochberg is valid as long as the null p-values are superuniform, this means:

$$ \Pr[P_i \leq t \mid H_0] \leq t $$

This is valid with "$=$" for uniform null p-values. It is also true for U-shaped mixture distributions (if the left peak of the U corresponds to alternatives, then a uniform component + a peak close to 1 will correspond to the null distribution, which consequently is subuniform). Also superuniformity holds for discrete distributions (which cannot lead to uniformly distributed p-values because of the discreteness).

air
  • 1,333
  • 12
  • 15
  • Thanks for this answer. Would you have any pointers to papers or websites that explain this in more detail? – dlaehnemann Jul 22 '19 at 10:14
  • Just came across this other answer by @air that gives some more insight into how Benjamini-Hochberg FDR control works, and also provides some references I'll dig into: https://stats.stackexchange.com/a/178350/254369 – dlaehnemann Jul 22 '19 at 11:34
  • I think what you define here, is actually "super-uniformity", not "sub-uniformity". I've dug into this a bit and there's a question where I provide a bunch of references for "super-uniformity" and "stochastical dominance over a standard uniform random variable" (https://stats.stackexchange.com/q/419005/254369). Should I edit your answer or would you prefer to do that? – dlaehnemann Jul 24 '19 at 15:34
  • Also, I have not yet found a proper reference stating that the null p-values need to be "super-uniform" or "stochastically larger than uniform". Where was this point originally proven? Or is it just worded differently in the literature? – dlaehnemann Jul 24 '19 at 15:36
  • 1
    @dlaehnemann thank you excellent catch about superuniform. I fixed it now. I feel this result, that superuniformity suffices (instead of uniformity) for FDR control is almost a statistical "folklore" result. It just happens that for many FDR control methods the argument goes through unchanged with superuniform, sometimes having to replace an "=" with a "$\leq$" in a step of the proof, but this still works to bound the FDR. Sometimes the authors spell it out, sometimes not. – air Jul 26 '19 at 05:39
  • 1
    One reference where the authors spell it out rigorously is "Two simple sufficient conditions for FDR control" by Roquain and Blanchard (EJS, 2008). See just a bit after their Definition 2.1. – air Jul 26 '19 at 05:40
  • Thanks for the update and the reference. Funnily, that's the one I was already reading and they nicely state the superuniformity assumption, although they phrase it as "stochastically lower bounded by a uniform random variable on [0, 1". Yet another version superuniformity to add to my other question... ;) – dlaehnemann Jul 26 '19 at 11:26