Editor's note: We have converted these 8 steps into a checklist for easier use: https://www.teamscopeapp.com/data-sharing-checklist
Sharing de-identified research data promotes collaboration within the research community. It allows researchers to increase the utility of their data and gives participants the possibility to maximize the benefit of their involvement by allowing their data to have a long-term impact. Ultimately, collaborative culture will save costs by enabling others to build upon robust and reliable prior research.
Data sharing, however, requires a researcher to be willing to contribute their data, more importantly, that he or she understands how to make it publicly available. A researcher may still decide to share their data at the end of his study; however, without proper preparations, it might be too late. Consent forms, copyrights, and de-identification are just a few prerequisites for sharing your data. Without which, beyond your study, the data could potentially end up siloed from the rest of the world.
The success of data sharing depends on how it was planned and whether it was intended since the beginning of a study. When researchers intend to share their data early on, they can include data sharing clauses in their informed consent form so their participants understand the extent of data use, anticipate issues with data anonymization and incorporate early on metadata that will ensure the usability of the research data itself.
This is our second article on Data Sharing, a series of posts where we cover everything you need to know to maximize the potential of your research.
In this article, we will go through the steps needed to share your research data successfully.
Data sharing starts with planning ahead, and often researchers will be motivated to share their results when it's too late. Considering these eight steps from the start of your project will allow you to streamline your data sharing plans.
Research is an expensive and laborious activity. Incurring in such efforts is only reasonable if currently available datasets are insufficient for the proposed study. Apart from doing a literature search, it is wise that data repositories are reviewed to see if the required data is not available.
Gaining informed consent for data publication from research participants before data is collected is the best practice. It avoids the cost and delay of attempting to obtain permission from participants after data has been collected. As long as participants are fully informed and have confidence that no identifying data will be shared it is reasonable to expect that large a number will be willing to have their de-identified data shared publicly.
Consent forms should:
A great resource to learn more about sensitive data and how to properly write consent forms for data sharing is University of Bristol's sensitive research data bootcamp.
Conducting a pilot study is an effective way to uncover possible issues with all aspects of a project. When the collected data in a project is of poor quality, the specific objectives project are undermined as well as the usability of that data for the research community.
By conducting a small scale preliminary study of the ultimate project, a researcher can anticipate issues with metadata, file export formats, and overall data integrity.
A copyright license is a legal agreement between a researcher who wants to use a dataset, image, or text and someone else who can give permission to use it. Licenses grant permissions under specific terms, and these conditions are intended to safeguard the researchers' authorship and work.
A Creative Commons (CC) license is one of several public copyright licenses that enable the free distribution of an otherwise copyrighted "work." (Wikipedia)
The most commonly used Creative Commons licenses for research work are:
CC-BY: This license lets others adapt and build upon your work, even for commercial purposes, as long as they credit the original creation. (Creative Commons)
CC0: This license waives all copyrights, and places work as fully as possible in the public domain, so that others may freely build upon, enhance, and reuse the works for any purposes without restriction. (Creative Commons)
Build fully customizable data capture forms, collect data wherever you are and analyze it with a few clicks — without any training required.
Before publishing your data, you must ensure that your datasets no longer contain any variables that might lead to the identification of individual respondents. The following variables, if existent, must be removed:
and the following variables must be recoded:
Researchers may choose to remove identifiers manually or use a data anonymization software like Amnesia.
Depending on the nature of the dataset and the feasibility to remove all identifiers, researchers may choose to share their data in a controlled way.
Broadly speaking, researchers can choose among two levels of accessibility:
Public-use: Public-use dataset include data that has been thoroughly filtered to mitigate the risk of confidentiality violations. All data that could lead to the identification of participants will be removed or altered.
Restricted-use: In some cases, it might not be viable to remove from a dataset all sensitive variables. This might cause the researcher to lose the ability to reproduce or extend the original study findings. In addition to having a version of their data for public-use, researchers can choose to archive a version containing identifiable information as restricted-user. Access to data that is archived as restricted-use is only granted to specific parties that have agreed to protect the confidentiality of respondents.
The FAIR Data Principles are a set of guiding principles that make data findable, accessible, interoperable, and reusable. These principles provide guidance to data producers and publishers on how to maximize the utility of research data.
Metadata is one of the fundamental building blocks of the FAIR Data Principles. Metadata is data that gives descriptive information about any entity. A dataset that meets the FAIR principle has elaborate and precise metadata that describes the dataset, and it's variables.
Another essential element of the FAIR principle is persistent and unique identifiers. Datasets should be assigned a unique id that allows others to track its history and cite it. An example of a widely used identifier for datasets is the Digital Object Identifier (DOI).
Lastly, choose a research data repository. Using a repository will allow your dataset to be preserved over time, be findable by others, and easily citable.
There are institution-specific, discipline-specific, and general-purpose data repositories. Data repositories will provide users an online interface where researchers can search for and discover data, though not necessarily obtain direct access if the dataset is restricted-use.
For a list of research data repositories you can use, we recently published a list of six general-purpose data repositories, some of which are free of charge.
Data sharing can be a fascinating endeavor for researchers. It allows them to improve transparency in their findings, gain more visibility, and enhance the impact of their work.
Data sharing requires proper planning. When data sharing plans begin as early as proposal writing, a research team will make sure consent forms do not block data sharing possibilities, that the research tools will yield high-quality data and that the output files will have the necessary metadata, so they are useful to others.
The eight steps shared in this article give researchers an understanding of the considerations they must keep in mind when sharing their work in a manner that acknowledges and safeguards the rights of participants, allows others to reuse that data and enables proper attribution and citation.
Preparing data for sharing : guide to social science data archiving. Amsterdam: Pallas Publications, 2010. (CC BY-NC-SA 3.0)
“About The Licenses.” Creative Commons, https://creativecommons.org/licenses/ (CC BY 4.0)
“Sensitive research data bootcamp.” University of Bristol, https://data.blogs.bristol.ac.uk/bootcampsd/