Journal policies increasingly recommend or require authors to share the data underlying the claims of their article, and, more importantly, to be explicit about all aspects of the data involved in their article. So what should you do if your research didn’t generate any data or if your research reused existing data?
The first thing is to double-check that there really aren’t any data underlying your article. In many cases, researchers overlook data that are not central to their claims, but which are nevertheless necessary in reaching the results. In other cases, researchers may feel that certain of their data could be of no interest to other parties. These data can be useful to others looking to replicate your work or verify their own methods. In general, it is expected that only particular types of articles, for example Reviews, may not be founded upon data.
Here are two rules of thumb that can help when checking an article for data that should be explicitly mentioned in the data availability statement:
- Does the article provide summary statistics such as averages, standard deviations, p-values, etc? If yes, there are raw data files from which these were generated.
- Do any of the sub-headings or paragraphs in the Methods mention a technique that generates data.
Next, whatever the situation with data is, you should include a transparent data availability statement. This can save those hunting for specific types of data having to read the whole article just to determine if you have the data they need, as well as facilitating wide-scale analysis of data availability [https://doi.org/10.2218/ijdc.v13i1.614 , https://doi.org/10.1098/rsos.180448]. There is currently some lack of consensus on whether the data availability statement should contain only information about data you generated, or also information on the data you reused. We encourage the latter so interested readers can get a complete data overview by just reading this section.
Moving on to cases where you’ve used existing data from others’ research, the first question to ask is: has it already been shared or not? If yes, then you should cite the data (in the data availability statement and anywhere else you deem appropriate). A common error here is to only cite the article associated with the data and not the data itself. The rule is “cite what you use”--if cite the article if you used the theories, cite the data if you used the data. Also bear in mind that only citing the article associated with data fails to directly link your article with the data it is founded upon. As more articles are published based on the data and its later analyses, interested parties may have to backtrack through a chain of citations in order to find the article which actually provided the direct link to the data.
On a related note, when reusing existing data from other sources, it is still best practice to share the data generated via your meta-analysis. Often, these are small enough to fit either in the tables of the manuscript or the supplementary materials. It is always best to also provide an accessible copy via a repository.
One final case of note is how to deal with requests for data that you have received from other parties (i.e., you produced a paper based upon data that you received upon request, and now someone else is requesting access to that same data). This is a very clear example of why ‘available upon request’ is a poor approach to data sharing as it results in either having to refer data requesters to someone else (despite the fact it is your article they are interested in) or having to discuss bespoke terms for re-sharing the data when you first receive it. Data repositories overcome both of these problems.
For guidance on how to cite data, see this post on the basics of data citation. For more information on how to write a data availability statement for no or reused data, see our author information page: https://www.springernature.com/gp/authors/research-data-policy/data-availability-statements/12330880 .