The Data Availability Statement (DAS) is one of the most important sections for the future credence of a manuscript. It is where interested readers (and, to some degree, automated algorithms) go to locate the data upon which your manuscript’s claims lie, whether to verify those claims or to utilise the data for further research.
Unfortunately, the DAS is also the section most likely to suffer neglect or hastiness at the hands of its authors. This blog post gives simple tips for writing a dazzling (or, at least, useful) DAS, along with illustrative examples.
Tip 1: Use manuscript elements (Figures, Tables, Methods) to help identify your data. In general, charts, plots and tables summarise larger amounts of data. Ask yourself “Have I mentioned the data used to generate this figure/table in my DAS?” You can even say which data files underlie which figures and tables, as in this example from this DAS:
- The histology images supporting Fig. 2 and Figs. 4–8, are publicly available in the figshare repository, as part of this record: https://doi.org/10.6084/m9.figshare.11907768 . Data supporting Fig. 3, Tables 1–5 and Supplementary Tables 1–3 are not publicly available in order to protect patient privacy.
Methods subsections describe the steps you went through in your investigation, but can also be thought of as the ways you processed your data on your way to obtaining the results. Aim to share the data for all the main steps, and consider sharing data for all the intermediate steps.
Tip 2: Put your data in a repository. Data “available on request” is better than not available at all, but is subject to several pitfalls. Is the data owner still using the same email? Do they remember which hard drive the data are on? Are the data small enough to be emailed? Are the data on media that can still be read?
Repositories make sharing data easier. For many types of data, there are domain-specific repositories with specialist features that enhance interaction with the data. Some repositories offer controlled/restricted access (i.e., for sensitive data). Also, beware: for some data types, there are community mandates stating that the data must be shared in particular repositories. Springer Nature maintains a list of recommended repositories which can help you pinpoint an appropriate location for your data.
Tip 3: Declare all the data you generate. Imagine you have too little time to read the whole paper (imagine that!), but want to know about the data. Sentences like “All the data mentioned in this paper…” don’t help much because you need to read the whole paper to understand what ‘the data’ are.
Instead, be explicit about each piece of data you produced. This extract from a comprehensive DAS provides a good example, with each data output clearly stated along with its location (repository name) and a reference (which will contain a PID* link to the data):
- “Image data were extracted from the clinical PACS… stored in DICOM standard format on TCIA as collection COVID-19-AR. All clinical data were obtained from the Arkansas Clinical Data Repository (AR-CDR)... are provided on TCIA. Viral genomes… are available from the NCBI Sequence Read Archive.”
*NOTE: Good repositories provide a PID (persistent identifier), which will either be a DOI or an accession number. You should add a Reference for your data, and a citation in the DAS. You should also state which repository your data are in.
Tip 4: If your data are not openly available, say why. Researchers are sceptics. If they see only “The data are not available,” they are likely to wonder “What are they trying to hide?” Always state the reason you cannot share the data underlying the claims in your manuscript. If your data are available under certain conditions, or under a DUA (Data Use Agreement), state those conditions in your DAS. Also, as corresponding author contact details can expire, including a role at the institution is good practice.
Here are a few examples:
- “Raw data for dataset D1 are not publicly available to preserve individuals’ privacy under the European General Data Protection Regulation.”
- “As koalas are a vulnerable species, we cannot share the location data we gathered here in order to protect them from hunting.”
- “The cell phone mobility data was acquired from COMPANYX, and they have not given their permission for researchers to share their data. Data requests can be made to COMPANYX via this email: email@example.com”
Tip 5: Include a reference for your data, and cite it from the DAS. You may also add an in-text link to your data. This redundancy facilitates automatic detection of datasets by algorithms that trawl DASs or References sections. Having your data in your References section also lets other researchers give you credit for your data. Here’s a good example:
“The Leipzig catalogue of vascular plants dataset contains 1,315,562 vascular plant names with 351,180 accepted species names, and is available for download from the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig repository at https://doi.org/10.25829/idiv.1806-40-3009 ”
Tip 6: If there really are no data underlying your manuscript, say so. The standard text is “No datasets were generated or analysed during the current study”, and it could save someone hunting for data a lot of time.
That covers the main points. If you have specific questions about data sharing, reach out to our free Research Data Helpdesk.