1

I get stuck with understanding of the following two statements (from Wikipedia on p-values):

  1. The p-value is the probability of obtaining at least as extreme results given that the null hypothesis is true whereas the significance level $\alpha$ is the probability of rejecting the null hypothesis given that it is true.

  2. If one defines a false positive rate as the fraction of all “statistically significant” tests in which the null hypothesis is actually true, several arguments suggest that this is at least about 30 percent for p-values that are close to 0.05.

It is more or less explained in the Regina Nuzzo's 2014 Nature editorial. I predefined a level of significance = 0.05, made a single test and got a p-value of 0.049. The second statement tells me that the chances that I will be able to replicate this result using another sample is not 95%, but much lower. (I think it should be dependent on prior probabilities, but the statement in wikipedia makes more general conclusion).

The questions are:

  1. Is the second statement correct? Does it suggest that a priori probabilities of two hypothesis are equal to 0.5?

  2. How to understand it intuitively?

amoeba
  • 93,463
  • 28
  • 275
  • 317
German Demidov
  • 1,501
  • 10
  • 22
  • In your statement #2 it should read "false discovery rate" instead of "false positive rate". – amoeba Jan 14 '16 at 10:19
  • see http://stats.stackexchange.com/questions/166323/misunderstanding-a-p-value/166327#166327, section 1 is about p-values and section 2 about FDR –  Jan 14 '16 at 10:32
  • Why? Is it because the second statement suggests multiple testing? – German Demidov Jan 14 '16 at 10:33
  • @fcop, no, it is not even close. – German Demidov Jan 14 '16 at 10:34
  • @German Demidov; and this ... The FDR tells you that, if you perform many tests on the same sample and you find 1000 discoveries (i.e. rejections of $H_0$) then with an FDR of 0.38 you will have $0.38 \times 1000$ false discoveries. –  Jan 14 '16 at 10:38
  • @fcop it is not a question about frequentists' FDR, my initial tags included bayesian, it is a question about posterior probabilities...Here is the link that can make things clear: [link](https://swfsc.noaa.gov/uploadedfiles/divisions/prd/programs/etp_cetacean_assessment/of_p_values_and_bayes__a_modest_proposal.6.pdf) – German Demidov Jan 14 '16 at 10:43
  • You can put the bayesian tag back if you want, apologies for taking out. But please check your formulation of statement #2: "the fraction of all “statistically significant” tests in which the null hypothesis is actually true" is NOT called "false positive rate", this is known as "false discovery rate". – amoeba Jan 14 '16 at 11:54
  • @amoeba can not, it is a quote from wiki: [P-values](https://en.wikipedia.org/wiki/P-value). So may be we can edit the wikipedia article. Actually, I would also like to understand if the second statement is correct and may be edit it more general. – German Demidov Jan 14 '16 at 12:09
  • 1
    Any estimate of a false discovery rate (as opposed to calculating a rate conditional on the truth of some hypothesis) must involve assumptions about the prior probabilities of hypotheses. The assumptions may be plausible for some collections of tests (say those appearing in papers published in the journals of a particular field) but not for others. So unqualified statements like those in the Wikipedia article are unwise. [But now I look at the article in question I see that statement is followed by:-" In order to arrive at this number, one needs to postulate something about the prior ... – Scortchi - Reinstate Monica Jan 14 '16 at 12:30
  • probability that a real effect exists". I'm a little puzzled about exactly what you're asking: Neither Wikipedia, nor any papers cited so far in this thread even seem to be claiming to be able to make such statements without using prior probabilities.] – Scortchi - Reinstate Monica Jan 14 '16 at 12:33
  • 1
    I see. I did not realize it's an exact quote. I have now formatted it as such. – amoeba Jan 14 '16 at 12:35
  • @Scortchi may be I understood smth wrong, but it states: "at least about 30 percent", however, if you test highly probable hypothesis (high prior probability), this percentage is lower. Or may be I am wrong, that is what I am trying to figure out. – German Demidov Jan 14 '16 at 12:56
  • @German Demidov Yes it is lower. There are some numerical example in the paper of my answer. – peuhp Jan 14 '16 at 13:31
  • @peuhp so we can conclude that 2nd statement is wrong. That's good, I thought that I do not understand something and it is really true (statement was repeated several times in different papers in the present form). – German Demidov Jan 14 '16 at 13:45
  • Yes it is wrong in general but the authors think that in practice it is a reasonable lower bounds (I am not arguing that this is the case but try to explain their positionings !). – peuhp Jan 14 '16 at 13:48

1 Answers1

1

I suggest you to read http://rsos.royalsocietypublishing.org/content/1/3/140216 that contains most of the elements you need.

To answer your first question, for a set of tests providing $p$-values $\in [0.045,0.05]$ and power=0.8, the FDR (as defined in the second statement of your question) is 26% if there are as many tests with true effect as tests with no true effect (page 9 of the paper). Notice that the restriction to $p$-value $\in [0.045,0.05]$ is very important and the FDR decreases with letting the $p$-value having smaller values or/and when the proportion of true effect tests increases.

To answer your second question, the two statements are radically different. Indeed, in statement 2 of FDR, the ratio is obtained by averaging over all conclusive tests accounting from both the real effect case and the not-real effect case (with a given proportion). While in the first statement for the type I error, the ratio is computed over the (hypothetical) tests for the all observations that could be generated under the null hypothesis of no real effect only.

peuhp
  • 4,622
  • 20
  • 38
  • Yes, but again, in abstract: _If you use p = 0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time._ Isn't it true only for 2 hypothesis, P(H0) = P(H1) = 0.5? (I guess your words about the ratio true effect/no effect means the same). Also they say nothing about the power of test. I believe it can be explained from 2 points of view: yours, using the power of tests, and Bayesian, using only prior probabilities... – German Demidov Jan 14 '16 at 11:42
  • @GermanDemidov. Yes, 26% is derived with P(H0) = P(H1) = 0.5 (but I am not sure that it can be write in this term knowing that everything is frequentist here?) and a given power that they consider as reasonable i.e. =0.8. Obvisouly the results depend on the power of the test but it maybe to generalize... – peuhp Jan 14 '16 at 13:07
  • IMHO, the interest is not on the figure of 30% but to be aware of that effect. Nevertheless I agree that Bayes factor with well designed prior of model (which is another problem) yields to a more satisfactory solution (look https://swfsc.noaa.gov/uploadedFiles/Divisions/PRD/Programs/ETP_Cetacean_Assessment/Of_P_Values_and_Bayes__A_Modest_Proposal.6.pdf for this aspect). Hope it helps – peuhp Jan 14 '16 at 13:15
  • yes I provided this link in the comments to the main message =) The next question for me is to understand how to calculate priors, do you know some papers/books considering this? Have a big troubles with understanding bayesian way of thinking...Also do not understand how to compare between "power-of-tests" method and Bayesian - they definetly can give different FDR for the same data... – German Demidov Jan 14 '16 at 13:21
  • @GermanDemidov IMHO this last question cannot be answered without a more precise description what your trying to do i.e. what kind of comparison are you looking for .IMHO, as mentioned in comments of your question, the "30% statement" is very unspecific and give a figure to something that is in practice very dependent on the application... maybe some other users would answer this more specifically. – peuhp Jan 14 '16 at 13:30