28

It is helpful to study the data analysis code of experts. I've recently been perusing github and there are a number of people sharing data analysis code there. This includes a few R Packages (which of course are available directly from CRAN), but also several examples of reproducible research, particularly using R (see this R list on github).

  • Who are good people to follow on github to learn about best practice in data analysis?
  • Optionally, what kind of code do they share and why is this useful?
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250

3 Answers3

18

Hadley Wickham. He has several exploratory data analysis projects on Github that you can look at (e.g., "data-baby-names"), and given the awesomeness of ggplot2/plyr/reshape, I have a default (but admittedly blind) trust in his best practices, particularly with respect to his own packages.

Plus, you get an early heads up on other projects he's working on!

raegtin
  • 9,090
  • 12
  • 48
  • 53
  • 5
    (+1) He's also working on a set of tutorials on [Advanced R development](https://github.com/hadley/devtools/wiki), very handy! – chl Nov 11 '10 at 08:30
  • @Jeromy In fact, it seems this is merely a way to draft his future textbook (check HW's past tweets). – chl Nov 11 '10 at 09:26
9

I also follow John Myles White's GitHub repository. There are several data-oriented projects, but also interesting stuff for R developers:

chl
  • 50,972
  • 18
  • 205
  • 364
7

Diego Valle Jones. His Github, especially analysis of homicides in Mexico is really interesting.

radek
  • 1,207
  • 2
  • 15
  • 37