How sharing your data could increase your citations

A growing body of evidence suggests that research papers with their data shared in a repository are cited more. Here we look at this evidence and what it means for researchers and their data.

Like Comment

Data is both a foundation and output of good research. Growing recognition of this status has seen researchers increasingly encouraged or required to share data (along with code and protocols) as the evidence behind their published literature, driving transparency and reproducibility1. The COVID-19 pandemic has in particular demonstrated the need for evidence-based decision making, with research data coming under wide scrutiny2,3

Much of the impetus for including data sharing in journal and funder policies is the wide benefit to science and society, for example the huge costs of not making research data FAIR (Findable, Accessible, Interoperable and Reusable)4. These policies are often more stick than carrot however, and researchers have indicated that better credit would motivate them to share their data5

This question of credit for data sharing has prompted a number of studies to look at the links between existing credit mechanisms, mainly citations, and data sharing.

Open data in open access publishing

A key study published in PLOS One in 20206 found a correlation of up to 25.36% more citations for articles that share their data in a repository. The method categorised articles based on their data availability statements (DASs) as data: 

    • available in a repository
    • available ‘within’ the paper/Supplementary Information (SI)
    • available on request
    • not available

then examined the associated citation impact alongside each of these.

DASs of over half a million open access articles were analysed from PLOS and BMC (where DASs are a requirement) up to 2018. The citation prediction model was based on almost 46,000 of these articles up to 2015, giving a three-year time window to assess impact on citations.

This study provides both a large sample size and a wider range of disciplines than previous research, putting solid figures to the impact of data sharing on credit that authors receive. There are a number of interesting implications:

  1. Putting data in repositories was the only method of data sharing significantly correlated with citation impact. This suggests that more common methods, such as sharing via SI, are of less value to a researcher in gaining credit.
  2. Data sharing in repositories is still not widespread despite this benefit, and journal policies encouraging/requiring data deposition: 12.2% (BMC) and 20.8% (PLOS) shared data this way.
  3. As data sharing policies from publishers and funders have become much more widespread since 2015, there is potential to develop this line of investigation to get a fuller picture.

Disciplinary differences

The above research builds on studies looking at specific research areas. Four such papers report a range of positive citation impacts ranging from 9-50% in the fields of gene expression7, astronomy8, astrophysics9 and paleo-oceanography10. Just as data sharing expectations and requirements vary between research communities, in particular around the use of repositories, the impact seen in these studies spans a range of values.

Certain fields have long-standing community expectations in data sharing, and established specialist repositories (such as those for nucleic acid sequences). These areas typically see much stronger levels of data sharing, but these studies indicate how widespread the impact of this practice can be across disciplines.

Why is there a link?

The studies mentioned above mainly establish a correlation between citations and data sharing, rather than the cause of this relationship. However a number of suggested reasons tally with the wider benefits of data sharing to research and society as a whole:

  • Reuse potential: openly-available data is an additional reusable research output, suggesting the study is more likely to form the basis of further research and the paper more likely to be cited.
  • Reproducibility and transparency: as data form the evidence behind results of the research paper, having this evidence openly available may signal to a reader that its claims can be trusted, or at least more easily assessed.

An area of growing focus in research data communities is to look at specific tasks that readers are completing when searching for data or interacting with data via article pages11.

Alternative metrics

Citations are just one form of currency for academic credit. A concerted effort is underway to move away from purely journal- and article-level metrics, to consider a wider range of research outputs in a way less focused on prestige publications12. In this context data sharing can support a researcher in both traditional metrics and this shift to a more equitable base for credit - as it appears that sharing research data does support citation counts, but it also represents an under-shared alternative research output.

Many repositories provide alternative metrics and the capacity to cite data directly (see our related blog on this subject), enabling better recognition of data outputs outside of the article context.

Implications for researchers and publishers

For researchers, a straightforward conclusion is to share your research data and to link it to your research publication. In addition to policy compliance and community expectations in these areas, the support this provides for your paper contributes to reproducibility and transparency. These may be contributing factors than see such papers being, on average, better cited. 

For publishers, there is a clear need to make data sharing and linking as straightforward as possible in the publishing process and systems, for the benefit of submitting authors. Data sharing and data linking are still not the norm except in specific research areas. These areas show a concerted effort and combined approach of policy, technical support and clear benefits to authors can drive better data sharing.

Do you have a question about research data? 

Get free help and advice on sharing your research data: visit our research data help desk.

References

  1. Hrynaszkiewicz et al (2017) https://doi.org/10.2218/ijdc.v12i1.531
  2. https://www.pfizer.com/news/press-release/press-release-detail/pfizer-and-biontech-announce-vaccine-candidate-against
  3. https://www.theguardian.com/world/2020/jun/03/covid-19-surgisphere-who-world-health-organization-hydroxychloroquine
  4. Publications Office of the EU (2019). Cost of not having FAIR research data. https://op.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1/language-en
  5. Digital Science et al (2019). The State of Open Data Report 2019. https://doi.org/10.6084/m9.figshare.9980783.v2
  6. The citation advantage of linking publications to research data https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0230416
  7. Piwowar & Vision (2013) https://doi.org/10.7717/peerj.175 
  8. Henneken & Accomazzi (2011) https://arxiv.org/abs/1111.3618 
  9. Dorch et al (2015)  https://arxiv.org/abs/1511.02512 
  10. Sears et al (2011) https://figshare.com/articles/Data_Sharing_Effect_on_Article_Citation_Rate_in_Paleoceanography/1222998/1
  11. https://www.rd-alliance.org/groups/data-discovery-paradigms-ig
  12. San Francisco Declaration on Research Assessment (2012) https://sfdora.org/read/

Banner image by Anna Nekrashevich from Pexels

Graham Smith

Senior Research Data Editor, Springer Nature

At Springer Nature I work to develop and promote data publishing tools, initiatives and policies across the organisation. I have an academic background in geology and geophysics, specifically studying seismics at live volcanoes. I have previously worked in a similar data-focused role at the Natural History Museum, managing data pathways and curation practices for big taxonomic and collection data.