Sharing information stimulates science. When researchers choose to make their data publicly available, they are allowing their work to contribute far beyond their original findings.
The benefits of data sharing are immense. When researchers make their data public, they increase transparency and trust in their work, they enable others to reproduce and validate their findings, and ultimately, contribute to the pace of scientific discovery by allowing others to reuse and build on top of their data.
"If I have seen further it is by standing on the shoulders of Giants."
Isaac Newton, 1675.
While the benefits of data sharing and open science are categorical, sadly 86% of medical research data is never reused. In a 2014 survey conducted by Wiley with over 2000 researchers across different fields, found that 21% of surveyed researchers did not know where to share their data and 16% how to do so.
In a series of articles on Data Sharing we seek to break down this process for you and cover everything you need to know on how to share your research outputs.
In this first article, we will introduce essential concepts of public data and share six powerful platforms to upload and share datasets.
The best way to publish and share research data is with a research data repository. A repository is an online database that allows research data to be preserved across time and helps others find it.
Apart from archiving research data, a repository will assign a DOI to each uploaded object and provide a web page that tells what it is, how to cite it and how many times other researchers have cited or downloaded that object.
When a researcher uploads a document to an online data repository, a digital object identifier (DOI) will be assigned. A DOI is a globally unique and persistent string (e.g. 10.6084/m9.figshare.7509368.v1) that identifies your work permanently.
A data repository can assign a DOI to any document, such as spreadsheets, images or presentation, and at different levels of hierarchy, like collection images or a specific chapter in a book.
The DOI contains metadata that provides users with relevant information about an object, such as the title, author, keywords, year of publication and the URL where that document is stored.
The International DOI Foundation (IDF) developed and introduced the DOI in 2000. Registration Agencies, a federation of independent organizations, register DOIs and provide the necessary infrastructure that allows researchers to declare and maintain metadata.
Once a document has a DOI, others can easily cite it. A handy tool to convert DOI's into a citation is DOI Citation Formatter.
Now that we have covered the role of a DOI and a data repository, below is a list of 6 data repositories for publishing and sharing research data.
Figshare is an open access data repository where researchers can preserve their research outputs, such as datasets, images, and videos and make them discoverable.
Figshare allows researchers to upload any file format and assigns a digital object identifier (DOI) for citations.
Mark Hahnel launched Figshare in January 2011. Hahnel first developed the platform as a personal tool for organizing and publishing the outputs of his PhD in stem cell biology. More than 50 institutions now use this solution.
Figshare releases' The State of Open Data' every year to assess the changing academic landscape around open research.
Free accounts on Figshare can upload files of up to 5gb and get 20gb of free storage.
Mendeley Data is an open research data repository, where researchers can store and share their data. Datasets can be shared privately between individuals, as well as publicly with the world.
Mendeley's mission is to facilitate data sharing. In their own words, "when research data is made publicly available, science benefits:
- the findings can be verified and reproduced- the data can be reused in new ways
- discovery of relevant research is facilitated
- funders get more value from their funding investment."
Datasets uploaded to Mendeley Data go into a moderation process where they are reviewed. This ensures the content constitutes research data, is scientific, and does not contain a previously published research article.
Researchers can upload and store their work free of cost on Mendeley Data.
Build fully customizable data capture forms, collect data wherever you are and analyze it with a few clicks — without any training required.
Dryad is a curated general-purpose repository that makes data discoverable, freely reusable, and citable.
Most types of files can be submitted (e.g., text, spreadsheets, video, photographs, software code) including compressed archives of multiple files.
Since a guiding principle of Dryad is to make its contents freely available for research and educational use, there are no access costs for individual users or institutions. Instead, Dryad supports its operation by charging a $120US fee each time data is published.
Harvard Dataverse is an online data repository where scientists can preserve, share, cite and explore research data.
The Harvard Dataverse repository is powered by the open-source web application Dataverse, developed by Insitute of Quantitative Social Science at Harvard.
Researchers, journals and institutions may choose to install the Dataverse web application on their own server or use Harvard's installation. Harvard Dataverse is open to all scientific data from all disciplines.
Harvard Dataverse is free and has a limit of 2.5 GB per file and 10 GB per dataset.
OSF is a free, open-source research management and collaboration tool designed to help researchers document their project's lifecycle and archive materials. It is built and maintained by the nonprofit Center for Open Science.
Each user, project, component, and file is given a unique, persistent uniform resource locator (URL) to enable sharing and promote attribution. Projects can also be assigned digital object identifiers (DOIs) if they are made publicly available.
OSF is a free service.
Zenodo is a general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN.
Zenodo was first born as the OpenAire orphan records repository, with the mission to provide open science compliance to researchers without an institutional repository, irrespective of their subject area, funder or nation.
Zenodo encourages users to early on in their research lifecycle to upload their research outputs by allowing them to be private. Once an associated paper is published, datasets are automatically made open.
Zenodo has no restriction on the file type that researchers may upload and accepts dataset of up to 50 GB.
Research data can save lives, help develop solutions and maximise our knowledge. Promoting collaboration and cooperation among a global research community is the first step to reduce the burden of wasted research.
Although the waste of research data is an alarming issue with billions of euros lost every year, the future is optimistic. The pressure to reduce the burden of wasted research is pushing journals, funders and academic institutions to make data sharing a strict requirement.
We hope with this series of articles on data sharing that we can light up the path for many researchers who are weighing the benefits of making their data open to the world.
The six research data repositories shared in this article are a practical way for researchers to preserve datasets across time and maximize the value of their work.
Cover image by Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IG.
“Harvard Dataverse,” Harvard Dataverse, https://library.harvard.edu/services-tools/harvard-dataverse
“Recommended Data Repositories.” Nature, https://go.nature.com/2zdLYTz
“DOI Marketing Brochure,” International DOI Foundation, http://bit.ly/2KU4HsK
“Managing and sharing data: best practice for researchers.” UK Data Archive, http://bit.ly/2KJHE53
Wikipedia contributors, “Figshare,” Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Figshare&oldid=896290279 (accessed August 20, 2019).
Walport, M., & Brest, P. (2011). Sharing research data to improve public health. The Lancet, 377(9765), 537–539. https://doi.org/10.1016/s0140-6736(10)62234-9
Foster, E. D., & Deardorff, A. (2017). Open Science Framework (OSF). Journal of the Medical Library Association : JMLA, 105(2), 203–206. doi:10.5195/jmla.2017.88
Wikipedia contributors, "Zenodo," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Zenodo&oldid=907771739 (accessed August 20, 2019).
Wikipedia contributors, "Dryad (repository)," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Dryad_(repository)&oldid=879494242 (accessed August 20, 2019).
“How and Why Researchers Share Data (and Why They don't),” The Wiley Network, Liz Ferguson, http://bit.ly/31TzVHs
“Frequently Asked Questions,” Mendeley Data, https://data.mendeley.com/faq