Questions tagged [project-management]

Organizing computational work on *statistical* projects; use for questions about data storage, data sharing, code repositories, etc. Note that questions about programming or unrelated to statistics are off-topic.

29 questions
91
votes
7 answers

How to efficiently manage a statistical analysis project?

We often hear of project management and design patterns in computer science, but less frequently in statistical analysis. However, it seems that a decisive step toward designing an effective and durable statistical project is to keep things…
chl
  • 50,972
  • 18
  • 205
  • 364
32
votes
7 answers

Why is a comma a bad record separator/delimiter in CSV files?

I was reading this article and I'm curious for the proper answer to this question. The only thing that comes to my mind it's perhaps that in some countries the decimal separator is a comma, and it may be problems when sharing data in CSV, but I'm…
David Gasquez
  • 498
  • 1
  • 5
  • 11
31
votes
6 answers

How to increase longer term reproducibility of research (particularly using R and Sweave)

Context: In response to an earlier question about reproducible research Jake wrote One problem we discovered when creating our JASA archive was that versions and defaults of CRAN packages changed. So, in that archive, we also include the…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
23
votes
4 answers

What are efficient ways to organize R code and output?

I am looking for input on how others organize their R code and output. My current practice is to write code in blocks in a text file as such: #================================================= # 19 May 2011 date() # Correlation analysis of…
DQdlM
  • 1,039
  • 2
  • 9
  • 20
22
votes
5 answers

How to keep exploratory analyses of large datasets in check?

When I start an exploratory analysis on a large data set (many samples, many variables), I often find myself with hundreds of derived variables, and tonnes of different plots, and no real way to keep track of what's going where. Code ends up like…
naught101
  • 4,973
  • 1
  • 51
  • 85
18
votes
10 answers

Strategy for editing comma separated value (CSV) files

When I work on data analysis projects I often store data in comma or tab-delimited (CSV, TSV) data files. While data often belongs in a dedicated database management system. For many of my applications, this would be overdoing things. I can edit…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
17
votes
5 answers

Simple, reliable, open, and interoperable plain text format for storing data

In a previous question I asked about tools for editing CSV files. Gavin linked to a comment on R Help by Duncan Murdoch suggesting that Data Interchange Format is a more reliable way to store data than CSV. For some applications a dedicated database…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
15
votes
3 answers

What is a practically good data analysis process?

I would like to know, or have references on, analysis process most of statistical data analysts go through for each data analysis project. If I make a "list", to complete data analysis project, an analyst has to: first collect requirements for…
Tae-Sung Shin
  • 655
  • 1
  • 9
  • 22
11
votes
3 answers

Improving variable names in a dataset

Good variable names are: a) short / easy to type, b) easy to remember, c) understandable / communicative. Am I forgetting anything? Consistency is something to look for. The way I would put it is that consistent naming conventions contribute…
Michael Bishop
  • 2,171
  • 3
  • 21
  • 31
9
votes
1 answer

Statistical project directory structure with multiple languages (e.g., R and Splus)?

Building on the post How to efficiently manage a statistical analysis project and the ProjectTemplate package in R... Q: How do you build your statistical project directory structure when multiple languages feature heavily (e.g, R AND Splus)? Most…
lowndrul
  • 2,057
  • 1
  • 18
  • 20
8
votes
5 answers

Preserving comments on graphs for exploratory data analysis

In performing exploratory data analysis, I will often print out the graphs and write out comments/annotations etc. Do people have suggestions for a better electronic methodology? I am especially interested in python/R. I am looking for something…
8
votes
5 answers

What is a good general purpose plain text data format like that used for Bibtex?

Context I'm writing a few multiple choice practice questions and I'd like to store them in a simple plain text data format. I've previously used tab delimited, but that makes editing in a text editor a bit awkward. I'd like to use a format a bit…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
7
votes
2 answers

Repositories and data analysis projects

Context: I've recently adopted version control as part of my data analysis work (finally I may hear you saying: see my earlier question on SO). This prompted me to think more about repositories and the directory structure I use for my projects. My…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
7
votes
3 answers

Workflow for using R with LyX for a statistical analysis?

For some time, I wanted to stop copy-pasting my R results into word, but climbing the LaTex mountain seemed to much to be worth it. Recently, I came to discover LyX, as a laymen's solution to people like me who do not wish to code their text, but…
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
6
votes
5 answers

Comparing reproducible research strategies: brew or Sweave vs. R2HTML

This is a bit more "think about it" question - but I see it as an important one to ask. I have been struggling for the past few days with having a more reproducible-research-like workflow. I am confused with the two different strategies for writing…
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
1
2