3

Jackman (2009) writes on p.xxxi-xxxii:

Consider researchers analyzing cross-national data in economics, political science, or sociology, say, using national accounts data from the OECD. [...] one can feed the OECD data to a computer program, and have stan- dard errors and significance tests reported for various statistics (e.g. means, correlations, regressions) as usual. But what do those standard errors mean in this context?

He concludes on p.xxxii that "adhering to a frequentist conception of probability in the face of non-repeatable data risks intellectual embarrassment" and is inappropriate.

Is this correct?

Jackman, S. (2009). Bayesian analysis for the social sciences. John Wiley & Sons.

2 Answers2

3

This seems to be a qualitative way of expressing the loss of confidence in the p-values used in frequentist hypothesis testing to quantify the significance of results.

First, p-values are notoriously difficult to interpret/explain. Quoting Andrew Gelman...

The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakessettings....The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations).

Second, as described in this July 2017 article from Nature, the use of frequentist p-values as a test of significance has in recent years helped to produce a slew of results deemed significant that cannot be reproduced...

Shlomo Argamon, a computer scientist at the Illinois Institute of Technology in Chicago, says... “no matter what confidence level you choose, if there are enough different ways to design your experiment, it becomes highly likely that at least one of them will give a statistically significant result just by chance”

Now, the risk described above can be largely eliminated if you have a simple experimental design that can be repeated, but if your experiment is non-repeatable, then you are stuck with the single p-value you get on the one try.

These problems with p-values may be correctable, and it is probably a bit of overstatement to think they are inherent to a "frequentist conception of probability", but it's also true that Bayesian methods are less encumbered by these particular issues.

  • 1
    The following goes in the same direction. Somewhat basic, an interesting start: Bartels, Christian (2014): Positioning Bayesian inference as a particular application of frequentist inference and vice versa. figshare. https://doi.org/10.6084/m9.figshare.867707.v4 Retrieved: 15:47, Oct 01, 2017 (GMT) – user36160 Oct 01 '17 at 15:54
1

The argument of "non-repeatability" is a red herring promoted by those on the extreme Bayesian side. After all, both frequentists and Bayesians start at precisely the same place: a model for potentially observable data given by $P(Y|X, \theta)$. This model states that the data we see is but one realization of a (potentially infinite) set of possible data sets. So, while the study may not be physically repeatable, it is certainly considered conceptually repeatable, by people in both camps.

By the same token, those on the extreme frequentist side will not like my characterization above, because they don't like the "conceptual" part. But when you get right down to it, except in places like casinos, studies are never precisely repeatable in a physical sense (and there is even an argument to be made about casinos), so the requirement of strict physical replicability would render most of statistics moot.

BigBendRegion
  • 4,593
  • 12
  • 22