Data articles: what are they and how can they benefit me?
What are data articles
Data articles are scientific publications dedicated to describing datasets. Unlike the more ubiquitous and familiar research articles, they do not present or discuss theories, or make research claims. Rather, they link to (publically available) datasets and provide comprehensive details to facilitate their reuse.
Why are data articles important?
In recent decades, it came to light that the results described in many research articles could not be reproduced--the reproducibility crisis. In the years since, scientific stakeholders (funders, institutions, publishers, policy makers) have been developing various approaches to combat the reproducibility crisis. Data articles are one such approach employed by publishers. Modelling data articles upon traditional research articles allows the possibility for peer review of data, thereby increasing quality of and confidence in the data.
Additionally there is potential for great savings in resources and time if more of the data underlying research claims are made openly available for others to use. It has been estimated that irreproducible biology research costs US $28 billion per year.
What advantages do data articles offer?
Though a vast majority of researchers deem open sharing of data important, the lack of credit for data sharing combined with the additional time it takes to do properly deter many. A data article is citable, and therefore offers a means to credit a researcher specifically for the data they produce and share. Furthermore, the in-depth methods of production and technical validation for the data not only facilitate reproducibility, but can also instruct others on technical best practices that might otherwise only come via lengthy and costly trial and error.
The components of a data article
Proper data citation is the backbone of a data paper, linking it to the dataset’s landing page in the data repository via a persistent identifier (PID) such as a DOI or identifiers.org link.
Each dataset should have its own reference in the References section. This practice allows authors and data-hosting repositories to be credited for their contributions.
The Data Availability Section (aka: Availability of data and materials, Data Records) is where readers can go for information on accessibility of your data. To give a comprehensive data overview, it should list and cite not only all datasets you generated, but also any you used. Additionally, as the technical infrastructure of data citation is still in its infancy, resolving PIDs (DOI, accession ID) should also be included in-text.
When a data paper describes data that underlies a research paper, the research paper is referred to as a related publication. If related publications exist, they should be cited from the data paper so readers of the data paper can see how the data has been used.
A Methods section should provide a detailed enough description of the generation of the data that readers could generate identical data (under the same conditions).
Typical article components that do not appear in a data article are: Results, Discussion, Conclusions and Supplementary Information.
Scientific Data is Springer Nature’s flagship data publication. Its in-depth Data Descriptor articles benefit from focused peer review to evaluate the technical quality and completeness of Data Descriptor and associated datasets.
BMC Research Notes publishes short reports to provide an inclusive forum for valuable data and research observations. Their Data Note article focuses on brief descriptions of datasets and rapid peer review.
BMC Genomic Data also publishes Data Notes, but limited to genomic, transcriptomic and high-throughput genotype data.
Poster image attribution: https://www.flickr.com/photos/gbsk/4980421657