Statistical significance or Hypothesis testing?

Question

Some seem to insist that Statistical Significance and Hypothesis Testing are different concepts$^\dagger$. Maybe some of them could come forward and explain why they think this way? I came across an interesting article which seems to agree there is no substantial difference.

There might be differences in emphasis, history and culture, maybe, but then what are those, and are they important?

$^\dagger$I got downvoted here so there must be at least one person thinking these concepts are distinct.

Relevant: Lehmann, E. L. (1993). [The Fisher, Neyman-Peerson Theories of Testing Hypotheses: One Theory or Two?](https://link.springer.com/content/pdf/10.1007/978-1-4614-1412-4_19.pdf) *Journal of the American Statistical Association*, 88(424), 201–208. — Alexis, Oct 31 '21 at 19:02
The significance test and hypothesis test methods were developed at different times by different people and those people argued endlessly in public about the deficiencies of each other's approaches. The first attempts to evaluate the evidence in the data whereas the latter sets rules for decisions designed to deliver control of erroneous decisions in the long run. They really are very different. Start here: https://stats.stackexchange.com/questions/16218/what-is-the-difference-between-testing-of-hypothesis-and-test-of-significance/16227#16227 — Michael Lew, Oct 31 '21 at 20:05
Mathematically Neyman/Pearson's approach to hypothesis tests can be thought of as a way to decide which test statistic to use to test a certain null hypothesis. They didn't originally think of it as opposed to Fisher's significance tests, but rather as adding an additional useful element to it. It was Fisher who didn't like the NP approach and tried to distance himself from it. (To be continued.) — Christian Hennig, Oct 31 '21 at 20:37
Personally I think that Fisher exaggerated the differences. A statistician of today can apply tests sensibly having all good arguments raised on all sides of the discussion in mind, and doesn't consistently have to adhere to one or the other side. (Of course some think that tests cannot be sensibly applied at all, but also those for this assessment may not want to distinguish between Fisher-, NP, or hybrid versions of testing.) — Christian Hennig, Oct 31 '21 at 20:41
@ChristianHennig Your comment "can apply tests sensibly" might be OK for a thoughtful well-trained statistician, but for the non-statistician who uses statistics in support of scientific inferences it is entirely unhelpful. Pretending that there is no difference will only perpetuate the widespread confusion. — Michael Lew, Oct 31 '21 at 21:01
@MichaelLew Most non-statistcians using statistics need to understand more basic issues with tests first, such as that a non-rejection doesn't mean the null is true, that tests are invalidated if you choose them dependently on the data, that p-values are not probabilities for the null to be true, that statistical significance doesn't imply substantial significance, issues with multiple testing etc., which are important regardless of whether one follows Fisher or N-P or tests in a "hybrid" mode. — Christian Hennig, Oct 31 '21 at 21:24
I think it comes down to how exactly you intend for us to interpret those two phrases when you use them. I am a bit concerned that most people here seem to be *adding* a word to the phrase *statistical significance* that you didn't use either time (*testing*), thereby answering what seems as if it may be a different question. It would be good if you could clarify. — Glen_b, Nov 01 '21 at 02:21
I.e. just to make sure ... did you intend the word *testing* to apply to both terms? — Glen_b, Nov 01 '21 at 02:29
Would you agree that 'statistical significance' and 'significance testing' are related but different concepts? — Sextus Empiricus, Nov 01 '21 at 13:00
Perhaps one might say that the statistical significance is evaluated using a significance test, whereas it is dichotomised using a hypothesis test. — Michael Lew, Nov 01 '21 at 20:06

Michael Lew · Answer 1 · 2021-10-31T20:55:23.057

Two questions on Cross Validated that contain answers:

What is the difference between "testing of hypothesis" and "test of significance"?

Is the "hybrid" between Fisher and Neyman-Pearson approaches to statistical testing really an "incoherent mishmash"?

Papers that explain in depth with historical context:

Goodman, Toward evidence-based medical statistics. 1: The P value fallacy. https://pubmed.ncbi.nlm.nih.gov/10383371/

Hurlbert, S., & Lombardi, C. (2009). Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian. Annales Zoologici Fennici, 46(5), 311–349. (Link to paper)

Lew, M. J. (2012). Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don't know P. British Journal of Pharmacology, 166(5), 1559–1567. doi:10.1111/j.1476-5381.2012.01931.x (Link to paper)

A paper that explains the difference and also puts it into the context of scientific inference:

Lew M.J. (2019) A Reckless Guide to P-values. In: Bespalov A., Michel M., Steckler T. (eds) Good Research Practice in Non-Clinical Pharmacology and Biomedicine. Handbook of Experimental Pharmacology, vol 257. Springer, Cham. https://doi.org/10.1007/164_2019_286

The link to the Goodman paper does not seem to work. Found it here: https://pubmed.ncbi.nlm.nih.gov/10383371/ — cdalitz, Oct 31 '21 at 20:44

Statistical significance or Hypothesis testing?

1 Answers1