Questions tagged [reproducible-research]

Research practice of making full experimental description, entire collected data, and all the data analysis scripts publicly available, so that the published results can be reproduced elsewhere.

Reproducible research is any scientific finding or result that can be independently replicated based on the methods detailed by the original investigator. It is a cornerstone of the scientific method. Reproducible research for statistical methods involves clearly describing the assumptions, approaches, and tests used for any data analysis. Statistical methods can also be used to assess how reproducible an original set of findings was given the similarity to its independent replications.

81 questions

votes

2 answers

How much do we know about p-hacking "in the wild"?

The phrase p-hacking (also: "data dredging", "snooping" or "fishing") refers to various kinds of statistical malpractice in which results become artificially statistically significant. There are many ways to procure a "more significant" result,…

asked Mar 09 '16 at 13:14

Silverfish

20,678
23
92
180

votes

15 answers

Complete substantive examples of reproducible research using R

The Question: Are there any good examples of reproducible research using R that are freely available online? Ideal Example: Specifically, ideal examples would provide: The raw data (and ideally meta data explaining the data), All R code including…

r references reproducible-research

asked Aug 21 '10 at 04:58

Jeromy Anglim

42,044
23
146
250

votes

3 answers

How are we defining 'reproducible research'?

This has come up in a few questions now, and I've been wondering about something. Has the field as a whole moved toward "reproducibility" focusing on the availability of the original data, and the code in question? I was always taught that the core…

reproducible-research philosophical

asked Aug 31 '11 at 03:39

Fomite

21,264
10
78
137

votes

8 answers

How do I get people to take better care of data?

My workplace has employees from a very wide range of disciplines, so we generate data in lots of different forms. Consequently, each team has developed its own system for storing data. Some use Access or SQL databases; some teams (to my horror)…

dataset reproducible-research quality-control

asked Oct 21 '10 at 16:26

Richie Cotton

votes

5 answers

Is p-value essentially useless and dangerous to use?

This article "The Odds, Continually Updated" from NY Times happened to catch my attention. To be short, it states that [Bayesian statistics] is proving especially useful in approaching complex problems, including searches like the one the Coast…

hypothesis-testing statistical-significance bayesian p-value reproducible-research

asked Jan 25 '15 at 20:01

SixSigma

2,152
1
14
24

votes

6 answers

How to increase longer term reproducibility of research (particularly using R and Sweave)

Context: In response to an earlier question about reproducible research Jake wrote One problem we discovered when creating our JASA archive was that versions and defaults of CRAN packages changed. So, in that archive, we also include the…

r reproducible-research project-management

asked Nov 12 '10 at 01:05

Jeromy Anglim

42,044
23
146
250

votes

3 answers

Who to follow on github to learn about best practice in data analysis?

It is helpful to study the data analysis code of experts. I've recently been perusing github and there are a number of people sharing data analysis code there. This includes a few R Packages (which of course are available directly from CRAN), but…

r reproducible-research

asked Nov 11 '10 at 06:59

Jeromy Anglim

42,044
23
146
250

votes

2 answers

What are some standard practices for creating synthetic data sets?

As context: When working with a very large data set, I am sometimes asked if we can create a synthetic data set where we "know" the relationship between predictors and the response variable, or relationships among predictors. Over the years, I…

modeling reproducible-research synthetic-data

asked Oct 15 '11 at 13:25

Iterator

2,294
1
15
22

votes

4 answers

As a reviewer, can I justify requesting data and code be made available even if the journal does not?

As science must be reproducible, by definition, there is increasing recognition that data and code are an essential component of the reproduciblity, as discussed by the Yale Roundtable for data and code sharing. In reviewing a manuscript for a…

dataset validation reproducible-research references

asked Aug 17 '11 at 16:52

David LeBauer

7,060
6
44
89

votes

1 answer

Has the reported state-of-the-art performance of using paragraph vectors for sentiment analysis been replicated?

I was impressed by the results in the ICML 2014 paper "Distributed Representations of Sentences and Documents" by Le and Mikolov. The technique they describe, called "paragraph vectors", learns unsupervised representations of arbitrarily-long…

text-mining natural-language word-embeddings sentiment-analysis reproducible-research

asked Nov 11 '14 at 15:34

bskaggs

votes

1 answer

How to create coloured tables with Sweave and xtable?

I am using Sweave and xtable to generate a report. I would like to add some coloring on a table. But I have not managed to find any way to generate colored tables with xtable. Is there any other option?

r reproducible-research

asked Mar 07 '11 at 11:15

RockScience

2,731
4
27
46

votes

1 answer

What if high validation accuracy but low test accuracy in research?

I have a specific question about validation in machine learning research. As we know, the machine learning regime asks researchers to train their models on the training data, choose from candidate models by validation set, and report accuracy on the…

machine-learning cross-validation reproducible-research

asked Apr 22 '15 at 17:34

Mou

votes

3 answers

Hosting options for publicly available data

So you've decided to support the idea of reproducible research and want to make your data available online for people to see and use. The question is, where do you host it? My first inclination is of course the private webspace I have on a…

reproducible-research

asked Nov 02 '11 at 15:33

Fomite

21,264
10
78
137

votes

4 answers

Implications of current debate on statistical significance

In the past few years, various scholars have raised a detrimental problem of scientific hypothesis testing, dubbed "researcher degree of freedom," meaning that scientists have numerous choices to make during their analysis that bias towards finding…

hypothesis-testing inference philosophical reproducible-research social-science

asked Nov 08 '13 at 03:16

Heisenberg

4,239
3
23
54

votes

1 answer

Why do people use PCA when it has so many issues?

(This is a soft question) Recently I'm learning Principal Component Analysis, and it appears to have a lot of issues: You have to transform the data to roughly the same scale before applying PCA, but how the feature scaling should be performed is…

self-study pca multivariate-analysis interpretation reproducible-research

asked May 13 '19 at 02:10

nalzok

1,385
12
24

2 3 4 5 6 Next