1

Is there an established way to assess the prevalence of data mining (as in specification search, not in the machine learning sense) in academic publications?

I vaguely remember hearing something about meta studies plotting p-values against sample-sizes or something of that sort, but I can't seem to find a single reference and fail to see right-away why this would be informative.

sheß
  • 317
  • 4
  • 23
  • I've never heard it called "specification search", but that seems like a very similar thread and @amoeba has an excellent answer in it. – Matt Krause Apr 07 '16 at 13:57

1 Answers1

1

One study dealing with this I came across studied the distribution of p values in published economics articles. They found that the common thresholds are usually just undercut. I.e. p-values cluster just below 5% and 10%. This is similar to what is asked for in the original question, but probably not the only/most well-known answer. Link to the article https://www.aeaweb.org/articles.php?doi=10.1257%2Fapp.20150044

sheß
  • 317
  • 4
  • 23