20

According to Popper, we cannot verify a hypothesis due to the problem of induction - we can only aim to falsify it. If we are repeatedly unable to falsify it, the hypothesis is said to be tentatively accepted. For Popper, all science shall come up with hypotheses and try to falsify them as hard as possible.

In some introductions to statistical hypothesis testing, I could read that scientists aim to falsify the null hypothesis and that this somehow is in accordance with Popper's theory of falsification. Here is a posting, stating this view. (User @Stefan did comment this posting, making exactly my point.)

I have three questions:

  1. Doesn't Popper say we should try to falsify the alternative hypothesis?
  2. Does falsifying the null hypothesis count as failed falsification of the alternative hypothesis?
  3. Might be some semantic sophistry: Shouldn't scientists try to corroborate the null hypothesis instead of trying to falsify it?

(If this posting should be in the "philosophy"-board, please move it there...)

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467

3 Answers3

17

I was also going to point to Deborah Mayo's work as linked in a comment. She is a Popper influenced philosopher who has written a lot about statistical testing.

I'll try to address the questions.

(1a) Popper didn't think of statistical testing as formalising his approach at all. Mayo states that this is because Popper was not expert enough in statistics, but also he probably wouldn't have allowed for an error probability of 5% or 1% as "falsification" (Mayo may also have mentioned this somewhere, but I don't remember).

(1b) There are different approaches for picking the null and alternative hypothesis. In some applications, the null hypothesis is a precise scientific theory of interest, and we check whether the data falsify it. This would be in line with Popper (at least if he allowed for nonzero error probabilities). In some other approaches (in many areas this is found much more often), the null hypothesis formalises the idea that "nothing meaningful is going on", and the alternative is of actual scientific interest. This would not be in line with Popper. (Also, the alternative is not normally specified precisely enough to imply conditions for falsification, and be it statistical.)

(2) According to the standard logic of statistical tests, the null hypothesis can be statistically (i.e. with error probability) falsified, but not the alternative. There is a possibility to argue that an alternative is statistically falisfied, but this basically amounts to running tests the other way round. For example, if you have a $H_0:\ \mu=0$ and an alternative $\mu\neq 0$, you cannot falsify the alternative (as it allows for $\mu$ arbitrarily close to 0, which cannot be distinguished by data from $\mu=0$), but you could state that a meaningful deviation from $\mu=0$ would actually be $|\mu|\ge 2$, and in this case you may reject $|\mu|\ge 2$ in case $\bar x$ is very close to zero. This makes sense if the power of the original test for $|\mu|\ge 2$ is large enough that in that case "$\bar x$ close to zero" would be very unlikely. (This is related to Mayo's concept of "severity"; in such a case we can say that $|\mu|<2$ "with severity".) We could also then say that we have "statistically falsified" $|\mu|\ge 2$.

(3) This is indeed a philosophical question, and I have seen arguments in either direction.

Christian Hennig
  • 10,796
  • 8
  • 35
  • 2
    " he probably wouldn't have allowed for an error probability of 5% or 1% as "falsification" neither would Fisher, ISTR in one of his books he said the significance level should depend on the nature of the analysis, but that 1:20 is a useful level in many settings. – Dikran Marsupial Jul 26 '21 at 12:34
  • @Lewian, regarding (1b), isn't it a common convention that H_1 is always the hypothesis a scientist comes up with and thinks explains a phenomenon, while H_0 is always the opposite, often stating no effect? I have read quite a lot of introductions to hypotheses-testing (all in German) and they all defined H_0 and H_1 as I just said. So in that case, is the whole process of hypotheses-testing Popperian? If the answer to this is "no" -> Where or how exactly is Poppers theory of falsification embedded in common hypothesis-testing in e.g. physics or psychology? Is it at all? – Rainer_Zoufal Jul 26 '21 at 16:55
  • 1
    To make things easier: What should I tell my high-school students if they ask why we do the whole null-hypothesis-thing? Is it correct to say that this is a result of Poppers falsificationism (because we try to falsify H_1 and we do this by trying to not-reject H_0, although what we _really_ hope for is that we are able to reject H_0 and therefore kinda tentatively accept H_1)? – Rainer_Zoufal Jul 26 '21 at 17:21
  • 1
    Although it is sometimes possible to falsify $H_1$, not-rejecting $H_0$ does not imply that $H_1$ has been falsified. The most common two results of a hypothesis test are probably that both $H_0$ and $H_1$ can explain the data, or that $H_1$ but not $H_0$ can explain the data. When I first introduce hypothesis testing, I teach my students that a normal hypothesis test on $H_0$ tells you literally nothing about $H_1$. The fact that its often $H_1$ we want to know about is precisely the problem people who say our scientific practice shouldn't be so dependent on hypothesis testing will raise. – Ian Sudbery Jul 26 '21 at 17:39
  • @Rainer_Zoufal I can't add much that I (or @IanSudbery) haven't written before. Historically null hypothesis testing came up about at the same time as Popper's falsificationism, so one isn't the "result" of the other. The H1 is not normally precisely specified so will not be falsified. I have seen instances of testing in physics where the H0 was the scientific hypothesis of interest, but this may be more exception than the rule. I use a test of a "nothing meaningful" H0 in order to check whether the data hold any evidence that *anything* meaningful is going on, which is not falsificationism. – Christian Hennig Jul 26 '21 at 22:31
  • @Rainer_Zoufal Tests of model assumptions are examples where the H0 is meaningful and of interest. Regarding situations in which the H0 is "nothing meaningful", I generally think that in science we shouldn't "hope to be able to reject the H0", as this inspires all kinds of manipulation. Ultimately such tests are legitimate and valuable if it is accepted that rejection of the H0 doesn't license to infer a *specific* H1; rather it only gives a "rough direction" in which the H0 is in all likelihood violated. If you wanted to falsify the H1, you'd need to specify it more precisely. – Christian Hennig Jul 26 '21 at 22:38
  • @Lewian, IAUI it is possible to have a "NHST" where you are arguing *for* the null hypothesis, but in those cases it is important to show that the power of the test is high, so that it would be a surprise for H0 not to be rejected if it was false (and so you can take it's lack of rejection as some level of evidence for the H0). I don't think it is done very often though as evaluating the statistical power is more difficult than just running the test the other way round. – Dikran Marsupial Jul 27 '21 at 06:30
  • @DikranMarsupial I think this is more or less equivalent to what I said in item (2), or at least follows a similar logic, and is also related to Mayo's "severity". – Christian Hennig Jul 27 '21 at 09:47
  • 1
    @DikranMarsupial I added something to (2) that makes the connection clearer. – Christian Hennig Jul 27 '21 at 09:55
  • The example that came to mind is the argument that there has been "no significant global warming since [cherry picked start date]" which is a common argument in the debate on climate. This would be a reasonable argument if the power of the test were good (but it generally isn't). In that case the minimum effect size is fixed by the expected rate of warming from the climatologists (which is small compared to the noise), so the only variable that can be altered is the duration of the period used to calculate the trend (needs to be at least 17 years for it to start to have any relevance) – Dikran Marsupial Jul 27 '21 at 10:09
7

Too long for comments, so here are my thoughts.

Null Hypothesis Statistical Testing (NHST) is only Popperian in the sense that no amount of corroboration proves a hypothesis correct, so often the best you can do is to find out what you can reasonably reject and continue on with hypotheses that have survived the tests thrown at them so far.

Firstly, we should avoid talking of falsifying the null hypothesis, and should stick to "reject" or "do not reject". Being able to reject the null hypothesis does not mean that we have shown it to be false, just that the observations are unlikely under that hypothesis. The observations may be even more unlikely under the alternative hypothesis! Here is the classic example:

enter image description here

In this case the null hypothesis is almost certainly true, even though we have rejected it, the detector was almost certainly giving a random false alarm. This is because the alternative hypothesis was even more unlikely to be true than the null hypothesis, because the prior probability of H1 was vastly smaller than that of H0, but the NHST does not take that into account. This is an example where rejecting the null hypothesis is not a failed falsification of the alternative hypothesis.

Conversely, if an NHST has low statistical power, then a failure to reject the null does not falsify the alternative hypothesis.

As @Dave suggests, sometimes we know for sure a-priori that the null hypothesis is false, for example a coin with two faces is unlikely to be exactly unbiased, i.e. p(head) = p(tail) = 0.5, but we may need a very large amount of coin-flips to detect the bias that is bound to be present, even in a coin that is to all intents and purposes "unbiased". Testing for normality involves a similar issue in most cases, AFAICS. Rejecting a hypothesis that you know to be false from the outset is not very Popperian, but that doesn't mean that such NHSTs cannot perform a useful purpose.

The Quine-Duhem Thesis suggests that in practice it is not that easy to falsify a hypothesis either.

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • 1
    About as easy as collecting $50 from this Bayesian Statistician come morning. – candied_orange Jul 27 '21 at 19:57
  • @candied_orange indeed, there is a whole thread about this excellent cartoon here: https://stats.stackexchange.com/questions/43339/whats-wrong-with-xkcds-frequentists-vs-bayesians-comic – Dikran Marsupial Jul 28 '21 at 06:51
0

Second-hand from Richard McElreath, but I think no. Popper's famous falsification theory was about falsifying experimental hypotheses not null hypotheses.

llewmills
  • 1,429
  • 12
  • 26