Questions tagged [data-snooping]

Misuse of data analysis to find patterns that can be presented as statistically significant. Done by performing many statistical tests and only reporting those that have yielded significant results.

Data dredging (also data fishing, data snooping, data butchery, and p-hacking) is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. This is done by performing many statistical tests on the data and only reporting those that come back with significant results.

Source: Smith & Ebrahim "Data dredging, bias, or confounding: They can all get you into the BMJ and the Friday papers" (2002) via Wikipedia.

2 questions
3
votes
2 answers

What is the difference between p-hacking and data mining bias?

From my understanding, data mining bias occurs when someone repeatedly searches through a data set to find statistically significant results. How is this any different from p-hacking? What is the difference between p-hacking and data mining bias?
Flux
  • 145
  • 6
2
votes
1 answer

Ansari-Bradley Test Sensitivity to Median Differences: Should we subtract the median from each group?

The Ansari-Bradley test appears to be ineffective at detecting scale differences when the two distributions have markedly different medians. However, when I subtract the median of each group, I get greatly increased power without maintaining the…
Dave
  • 28,473
  • 4
  • 52
  • 104