Hosting options for publicly available data

Question

So you've decided to support the idea of reproducible research and want to make your data available online for people to see and use. The question is, where do you host it?

My first inclination is of course the private webspace I have on a university server, but these things aren't actually all that persistent - if I leave, the directory stays open a very short period of time before it vanishes. Hardly the right setting for keeping data available for people to use and work with in the future.

Do you use something like GitHub or SourceForge? Or another service?

The data in question is the output of some simulations of very narrow interest - so I don't necessarily think somewhere like InfoChimps or another one of the public data repositories is the right home for it. This is less "You can learn things with this code!" and more "You can replicate Figure 3 in this paper".

Relevant, perhaps duplicate: http://stats.stackexchange.com/questions/10045/free-public-interest-data-hosting — Matt Parker, Nov 02 '11 at 15:37
Absolutely relevant - adding some details that suggests why I didn't think it was a duplicate. — Fomite, Nov 02 '11 at 15:38
@EpiGrad: What kind of date do you have in mind? If it is a source code related to your research project, you can attach it to your _arXiv_ preprint. — Piotr Migdal, Nov 02 '11 at 15:58
@PiotrMigdal Ideally, I'd like the data to be able to hang out for several years, long enough for the usual paper citation propagation etc. to work out. I'd attach it to an *arXiv* preprint if only my field used it ;) — Fomite, Nov 02 '11 at 16:00
@EpiGrad Then maybe a good place to search is [Open Data](http://en.wikipedia.org/wiki/Open_data) as an aspect of the Open Science - http://michaelnielsen.org/blog/open-science/. — Piotr Migdal, Nov 02 '11 at 16:20
If not arXiv, maybe your favorite journal supports adding supplementary data? — , Nov 02 '11 at 16:29
I've checked with the journal in question, we'll see what they say :) — Fomite, Nov 02 '11 at 16:39
@mbq Feel like sharing this update. Checked with the journal the paper this question was spawned by to see if they could host data. Turns out their publishing platform can't handle .csv files, or indeed any non-text files >. — Fomite, Apr 04 '12 at 02:03

score 4 · Accepted Answer · edited Apr 13 '17 at 12:44

One simple option is github.

I use it a bit to share data and data analysis code. A few good examples of others sharing code and data on the site are listed on this question.

Benefits of github

Easy to upload once you get familiar with git, and why not use git for your version control needs.
You can use gists for simple single files
It's easy for others to download single or multiple files as an archive
It has a good amount of free storage
source code can be browsed on the internet
and more...

Of course, github isn't perfect for data. I can see the merits of using a more permanent institutional repository or some other dedicated tool for more serious archiving.

This is actually the solution I went with. Part of the problem with an institutional repository is that what institution I'm at is in flux, and the data isn't really important enough for one of the big data warehouses. — Fomite, Nov 09 '11 at 06:14

score 4 · Answer 2 · answered Feb 19 '12 at 18:28

4

Another option seems to be Dataverse, which is available as a service and as open source software. I did not try it, though.

answered Feb 19 '12 at 18:28

Karsten W.

667
6
25

score 2 · Answer 3 · answered Nov 02 '11 at 16:37

One possibility for those in academe is the use of a campus digital repository often hosted by campus libraries (to me a logical locus for datasets that accompany publications).

A popular (free) digital repository is DSpace which, to my understanding, can host data sets. But this is a service that someone in your institution must host.

Hosting options for publicly available data

3 Answers3

Benefits of github