Questions tagged [p-hacking]

Misuse of data analysis to find patterns that can be presented as statistically significant. Done by performing many statistical tests and only reporting those that have yielded significant results.

Data dredging (also data fishing, data snooping, data butchery, and p-hacking) is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. This is done by performing many statistical tests on the data and only reporting those that come back with significant results.

Source: Smith & Ebrahim "Data dredging, bias, or confounding: They can all get you into the BMJ and the Friday papers" (2002) via Wikipedia.

12 questions
10
votes
3 answers

Why is ANOVA not p-hacking?

Say we have some data with many parameters. As an example let's say I'm an not-so-ethical journalist working for a food website and I'm looking to write some clickbait article "backed by science" about how some food or lifestyle is good/bad for…
gazm2k5
  • 305
  • 1
  • 7
10
votes
3 answers

Is this p-hacking?

I'm currently looking into the gender pay gap using data from glass door (found via kaggle). The dataset has columns for gender, age, performance evaluations of employees, seniority, pay etc. For context: I have learned a lot of Data Science/Machine…
10
votes
2 answers

Increasing sample size to obtain width of CI / SE: p-hacking?

I'm involved in running experiments, where we want to obtain sufficient sample size to obtain a certain width of CI (or equivalently a certain power). We currently run a pilot, of a few hundred units, calculate the variance (we ignore the size of…
Jeremy Miles
  • 13,917
  • 6
  • 30
  • 64
5
votes
1 answer

Stop collecting data when our confidence interval is sufficiently narrow?

I am running an experiment that is unpleasant and time-consuming, so I want to minimize the number of times I have to do it. I have done a sample size calculation to determine that I need $100$ repetitions of the experiment. A colleague noticed how…
Dave
  • 28,473
  • 4
  • 52
  • 104
4
votes
1 answer

Bonferroni correction versus the F-test

Many sources emphasize the importance of the F-test p-value in multiple regression and justify this in terms of p-hacking. It's kind of intuitive that if you can't reject the null hypothesis that all coefficients are 0, then it's silly to conclude…
zkurtz
  • 2,052
  • 16
  • 31
4
votes
1 answer

ROC-style curves for calculating sample size, power, alpha, and effect size

I found an awesome R package called pwr that does all sorts of calculations about sample sizes, power, effect sizes, and so on, and I've been playing. I have a number of tests that I've run. Now I want to know what kind of power to reject I can get.…
Dave
  • 28,473
  • 4
  • 52
  • 104
3
votes
1 answer

Example of multiple comparisons problem

Pharma companies can get FDA approvals on multiple indications for a single drug, e.g. different kinds of cancers. I was reading about the multiple comparisons problem, i.e. an effective inflation of the type 1 error. If a company tests for whether…
3
votes
2 answers

What is the difference between p-hacking and data mining bias?

From my understanding, data mining bias occurs when someone repeatedly searches through a data set to find statistically significant results. How is this any different from p-hacking? What is the difference between p-hacking and data mining bias?
Flux
  • 145
  • 6
3
votes
1 answer

Does running multiple similar models lead to p-hacking?

In my analysis I am testing a lot of similar models. I have two variables (let´s call them A and B) and a whole bunch of other variables (C1....C10). All models have the same approach: It is always looking whether there is an interaction effect…
heyho
  • 93
  • 3
2
votes
1 answer

Does principal component analysis (PCA) lead to p-hacking?

My knowledge of principal component analysis is only conceptual; I know nothing about the nuts and bolts of how it works. I learned about it from its use in sociolinguistics, as in Horvath & D. Sankoff (1987), and what I gather from their…
0
votes
0 answers

Data dredging in machine learning by training multiple models

Many machine learning papers I read follow something like this procedure: Split into test and train set, train different models on the train set and evaluate their performance on the test set, report the scores of the models on these test…
PascalIv
  • 404
  • 4
  • 10
0
votes
1 answer

Is adding batches of samples to a non-stat sig sample p-hacking?

Here's my scenario. I take 1000 samples each from my control and treatment. I note that the difference in means is not statistically significant (stat-sig) under the null hypothesis testing. I then add 1000 more samples each to control and…