Am I taking crazy pills here or this A/B testing tool's documentation dead wrong?

Question

In the documentation here for a popular A/B testing they say,

For example, if you run an A/B testing experiment with a significance level of 95%, this means that if you determine a winner, you can be 95% confident that the observed results are real and not an error caused by randomness. It also means that there is a 5% chance that you could be wrong.

I believe this to be dead wrong.

Statistical significance at 95% means that $\text{p-value} < 0.05$. The p-value is defined as

$$\text{p-value} \equiv P( \text{reject the null} \;|\; \text{the null is true})$$

The complement of this is $$1 - \text{p-value} = P( \text{fail to reject the null} \;|\; \text{the null is true})$$

So 95% significance tells you $P( \text{fail to reject the null} \;|\; \text{the null is true}) > 0.95$.

Assuming "determining a winner" means rejecting the null, the part of the documentation that says "if you determine a winner, you can be 95% confident that the observed results are real and not an error caused by randomness" to me means $0.95 > P(\text{null is false} | \text{reject the null})$ which is flatly not equivalent.

Am I taking crazy pills here? Is there some reasonable way to convert whatever the heck they're saying with "95% confident that the observed results are real and not an error caused by randomness" into $P( \text{fail to reject the null} \;|\; \text{the null is true})$?

1. The documentation is completely wrong. 2. The question would be a better fit for stackexchange when it is phrased in a more general way. For instance "what do p-values mean?" (although then it would become a duplicate https://stats.stackexchange.com/questions/166323/misunderstanding-a-p-value/166327 ). — Sextus Empiricus, Aug 27 '19 at 22:05
My question is different because I want to know if there is any charitable way (even if it's a stretch) to read "a significance level of 95% means that if you determine a winner, you can be 95% confident that the observed results are real and not an error caused by randomness" as not dead wrong. — TrynnaDoStat, Aug 27 '19 at 23:35
The confusion between 'significance-level/p-value' and 'the probabilty that the alternative hypothesis is real/true' is wrong and has been covered several times on this site. Stating the question in a way that it is about 'ways to read it as not wrong' doesn't change it into something different. — Sextus Empiricus, Aug 28 '19 at 06:36
I would respect a decision to close the question if it was deemed too specific or off topic. — TrynnaDoStat, Aug 28 '19 at 16:01
See https://normaldeviate.wordpress.com/2013/03/14/double-misunderstandings-about-p-values/ for a related discussion — Adrian, Aug 30 '19 at 04:16

score 6 · Accepted Answer · answered Aug 28 '19 at 06:29

You are right that the documentation is wrong.

Note that p values are defined somewhat differently from what you write. They do not measure the probability of a decision, such as the decision to reject the null or to fail to reject the null. They measure the probability of test statistics. Whether or not to reject a null hypothesis is a subsequent decision based on the p value and the alpha threshold.

Instead of test statistics, one often uses the shorthand "data":

$$\text{p-value} \equiv P( \text{data} \;|\; \text{the null is true}).$$

As you write, there is simply no way whatsoever to get from this to what the documentation is writing about, which is

$$\text{p-value}\; (\not\equiv)\; P( \text{the null is true} \;|\; \text{data}),$$

unless you are prepared to go the Bayesian route, posit priors etc. Not even "any charitable way (even if it's a stretch)". Nothing.

This is a very common misunderstanding. The American Statistical Association (ASA) recently published a statement on p values, and the statement we are discussing here is treated in statement 2 in the ASA document (see p. 131 in the The American Statistician document):

P-values do not measure the probability that the studied hypothesis is true

Am I taking crazy pills here or this A/B testing tool's documentation dead wrong?

1 Answers1

Linked