12

So you've decided to support the idea of reproducible research and want to make your data available online for people to see and use. The question is, where do you host it?

My first inclination is of course the private webspace I have on a university server, but these things aren't actually all that persistent - if I leave, the directory stays open a very short period of time before it vanishes. Hardly the right setting for keeping data available for people to use and work with in the future.

Do you use something like GitHub or SourceForge? Or another service?

The data in question is the output of some simulations of very narrow interest - so I don't necessarily think somewhere like InfoChimps or another one of the public data repositories is the right home for it. This is less "You can learn things with this code!" and more "You can replicate Figure 3 in this paper".

Fomite
  • 21,264
  • 10
  • 78
  • 137
  • 1
    Relevant, perhaps duplicate: http://stats.stackexchange.com/questions/10045/free-public-interest-data-hosting – Matt Parker Nov 02 '11 at 15:37
  • 1
    Absolutely relevant - adding some details that suggests why I didn't think it was a duplicate. – Fomite Nov 02 '11 at 15:38
  • @EpiGrad: What kind of date do you have in mind? If it is a source code related to your research project, you can attach it to your _arXiv_ preprint. – Piotr Migdal Nov 02 '11 at 15:58
  • @PiotrMigdal Ideally, I'd like the data to be able to hang out for several years, long enough for the usual paper citation propagation etc. to work out. I'd attach it to an *arXiv* preprint if only my field used it ;) – Fomite Nov 02 '11 at 16:00
  • @EpiGrad Then maybe a good place to search is [Open Data](http://en.wikipedia.org/wiki/Open_data) as an aspect of the Open Science - http://michaelnielsen.org/blog/open-science/. – Piotr Migdal Nov 02 '11 at 16:20
  • If not arXiv, maybe your favorite journal supports adding supplementary data? –  Nov 02 '11 at 16:29
  • I've checked with the journal in question, we'll see what they say :) – Fomite Nov 02 '11 at 16:39
  • @mbq Feel like sharing this update. Checked with the journal the paper this question was spawned by to see if they could host data. Turns out their publishing platform can't handle .csv files, or indeed any non-text files >. – Fomite Apr 04 '12 at 02:03

3 Answers3

4

One simple option is github.

I use it a bit to share data and data analysis code. A few good examples of others sharing code and data on the site are listed on this question.

Benefits of github

  • Easy to upload once you get familiar with git, and why not use git for your version control needs.
  • You can use gists for simple single files
  • It's easy for others to download single or multiple files as an archive
  • It has a good amount of free storage
  • source code can be browsed on the internet
  • and more...

Of course, github isn't perfect for data. I can see the merits of using a more permanent institutional repository or some other dedicated tool for more serious archiving.

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
  • 1
    This is actually the solution I went with. Part of the problem with an institutional repository is that what institution I'm at is in flux, and the data isn't really important enough for one of the big data warehouses. – Fomite Nov 09 '11 at 06:14
4

Another option seems to be Dataverse, which is available as a service and as open source software. I did not try it, though.

Karsten W.
  • 667
  • 6
  • 25
2

One possibility for those in academe is the use of a campus digital repository often hosted by campus libraries (to me a logical locus for datasets that accompany publications).

A popular (free) digital repository is DSpace which, to my understanding, can host data sets. But this is a service that someone in your institution must host.

MannyG
  • 338
  • 2
  • 10