Figshare and Digital Science published the 2018 State of Open Data report this week. Based on an annual survey run with us at Springer Nature since 2016, the report tracks changes in researchers' views and actions in managing and sharing experimental research data over time. The report also includes a number of contributed articles, including one from me (pasted below). The report is freely available to download on Figshare here. The data is also available in figshare, with a visualisation of the results here.
This year our contribution to The State of Open Data report highlights a challenge I think is particularly important. To accelerate data sharing, researchers need to be confident they will get proper credit. This year's survey found that “58% of respondents felt they did not get sufficient credit for sharing data, as opposed to 9% who felt they do.”
It needs to be "worth their time" in terms of meeting their goals - advancing their field, building visibility for their research, winning funding and progressing their academic career and reputation. The article I contributed to this year's State of Open Data report is below. It outlines some of the challenges, and some of the steps we can take to make progress. Springer Nature is active in many of these, and it will form a central part of our plans and activities in 2019.
I give some examples of community initatives below, and some of the things Springer Nature is already doing or supporting. These are not comprehensive - around Springer Nature we're also experimenting with open data badges and registered reports in the BMC and Nature Research Groups , piloting open solutions for software code, driving reproducibility in research, and so much more. We're also involved in many community intiatives not mentioned here, thanks to my colleagues across Springer Nature.
What do you think? What more can Springer Nature being doing to support credit for good data practice, and advocate for change in the research community? What do you think would convince researchers it is worth their time? Ideas, comments and suggestions welcome. We're currently planning some new market research on this topic in 2019.
From the 2018 State of Open Data Report:
From Green Shoots to “Grassroots”: How Can We Accelerate Data Sharing? Grace Baynes, VP, Research Data and New Product Development, Open Research, Springer Nature
Figshare tracking of researcher’s attitudes and actions in data sharing continues to bear new insights. Now in its third year, the 2018 survey shows some encouraging progress in respondents reporting making data openly available – up consistently year on year since 2016 to 64% in 2018. More researchers are also reporting publishing data in a specific data repository this year (33%) compared to 29% in 2017, which is great for making data more findable, accessible, usable and citable. Yet a closer look makes clear the work still to be done. A large percentage of this year’s respondents do not feel they get credit for sharing data, and publications in data journals remain a fraction of the world’s annual publications. With funders and institutions seeking “grassroots” support for data sharing from the research community, the issue of credit for good practice in data management and sharing is a fundamental one, with no easy answers. What steps can we take to make data sharing worth a researcher’s time and energy, and accelerate progress?
Encouraging data sharing: policy is not enough This year’s State of Open Data survey shows an interesting trend in more support from respondents for national mandates (63%) than in 2017 (55%). China’s Ministry of Science and Technology this year introduced their “Notice of the General Office of the State Council on the Measures for Managing the Printing and Distributing of Scientific Data”, which effectively mandates data sharing at a national level. The European Commission Horizon Europe proposal will mandate open access to research data as well as publications. Globally more than 50 funders now require data sharing, with the majority based in the US and Europe, particularly the UK. Yet in Springer Nature’s Practical challenges for researchers in data sharing, a survey of more than 7000 researchers, we found self-reported levels of sharing below the global average of 63% by respondents in the UK (58%) and US (55%).
Funders are increasingly committed to coupling policy with practical support for researchers. To give just a few examples, The European Open Science Cloud and NIH Data Commons pilot have significant funding and infrastructure behind them, as do investments by Wellcome and UKRI/JISC. Of note for funders from this year’s State of Open Data survey is the marked increase in lack of certainty about where funds will come from to support making data open.
Wanting the drive for data sharing to come from the research community itself has been another common thread in conversations with funders and foundations in the US, UK and Japan this year. Rather than a top-down, policy enforcement approach, many of the funders we have spoken with want the research community to create the momentum to share data, and help define discipline-appropriate ways of sharing. Some institutions are also taking this “grassroots” approach. TU Delft provides one case study, presented at the LIBER conference this year. Embedding “data stewards” in every faculty, to support researchers in good data practice, they also provide training, additional funding for data management and data publication, as well as a data repository via DANS. TU Delft is now in the process of developing its research data policy, which will be adapted by each faculty based on disciplinary needs. This is a longterm investment in the “bottom-up” approach, and will be worth watching over the next few years.
Finding the keys to grassroots support for data sharing In a number of fields, data sharing is already the established norm, supported by community standards, dedicated repositories and long standing funder mandates. Yet in Springer Nature’s Practical challenges for researchers in data sharing survey, we found that only 54% of respondents who produce specific biological and medical data (e.g. DNA and RNA sequences), are using existing dedicated community repositories to share their data. Making it easy to find out where to share data is clearly still important.
Responses to the question “Which one of the circumstances you chose would motivate you the most to share your data?” would suggest that visibility of research findings and the public good are the keys to making data sharing the status quo. “Funder requirements” were stated by just 69 respondents, ranking below (in order of popularity) increased visibility and impact, public benefit, transparency and reuse, journal and publisher requirements and getting proper credit. In my view, this masks the real issue. Researchers would share data more routinely, and more openly, if they genuinely believed they would get proper credit for their work that counted in advancing their academic standing and success in career development and grant applications, and for subsequent work that builds on their data. As noted in the analysis, “58% of respondents felt they did not get sufficient credit for sharing data, as opposed to 9% who felt they do.”
The 600+ free text responses to the question: “What credit mechanisms do you think would encourage more researchers to share their data?” warrant further analysis. Common themes from an initial review include citation, coauthorship and collaboration, and credit in research assessment.
We should not ignore the barriers and challenges that researchers experience in sharing data. Here this year’s State of Open Data survey adds some interesting insight to the body of research on this topic. The top six responses to “What problems/concerns do you have with sharing datasets?” were “Concerns about misuse of my data”, “Unsure about copyright and licensing”, “Not receiving appropriate credit or acknowledgement”, “Unsure I have the rights to share”, “Organising data in a presentable and useful way” and “Contains sensitive information”. All were selected by more than 400 respondents. To my knowledge, this is the first time concerns about misuse of data has come out so strongly in a global survey.
By contrast, in Practical challenges for researchers in data sharing, “Organising data in a presentable and useful way” was the most stated reason for not sharing data (46% of respondents). Other common challenges were: “Unsure about copyright and licensing” – 37%; “Not knowing which repository to use” – 33%; and “Lack of time to deposit data” – 26%.
With regard to short term actions, we need to better understand researchers concerns about “misuse of data” much better. Perhaps simpler to tackle is making sure that researchers are clear about their rights to share, and the copyright and licensing options available to them. Helping researchers to deposit, describe and share their data, using good metadata, remains a priority for Springer Nature.
Credit mechanisms of today: Data publication and data citation To provide true credit for good data practice, published, citable datasets need to be viewed as research outputs on a par with a research article in terms of career advancement and assessment. Realistically, routine inclusion of datasets, their citations and impact in grant assessments and CV evaluation is probably still years away.
In the meantime, we can encourage and measure the usage and citations of datasets. Initiatives such as the GO FAIR metrics group, the FAIRdat project from DANS and MakeDataCount are making strides in this area. Figshare and other repositories include download and citation statistics, and alternative metrics for datasets. They also provide DOIs or other unique identifiers for datasets, ensuring they are citable in their own right.
Encouraging and enabling data citations is also critical. As noted in the analysis of this year’s survey, “data citations are motivating more respondents to make data openly available, increasing 7% from 2017 to 46%”. Here there are also encouraging community initiatives we should support. For example, DataCite provides DOIs for research data, and provides a searchable registry of datasets, and a citation formatting reference tool. FORCE continues to make progress in implementing its Data Citation Roadmap with publishers and other stakeholder groups. Publishers are increasingly providing links to datasets on articles, and including dataset citations in article metadata. Data articles provide an established credit mechanism - a citable publication - while making datasets easier to find, access and reuse. Yet uptake of publishing data descriptors in data journals continues to be low. In this year’s survey, 18% of respondents reported publishing data in a data journal, compared to 20% in 2017. These percentages are high compared to the global research community. The two largest dedicated data journals by volume are Elsevier’s Data in Brief and Springer Nature’s Scientific Data. Both have grown strongly in 2018, on track to publish close to 2000 and 300 articles respectively. Together, that’s just 0.1% of the estimated 1.8 million articles published in English language journals annually. Perhaps there is more we can do here to make it easier for researchers to write and publish data articles, and see the benefits to their research in doing so.
We need to tell more stories about the benefits of data sharing There is compelling evidence as to the benefits of managing and sharing data, including productivity and citation advantages. I referenced these in my contribution to last year’s State of Open Data report. I still include them in almost every talk I give, because they continue to be “new news” when I share them, and not widely known. We need to continue to provide this evidence to the research community. We also need to do a much better job of finding and telling stories about researchers who are sharing data, the impact on their work and on the fields they work in. Coupling these real world examples and evidence with better credit, clear funding, practical help and answers to common questions are all essential factors in accelerating data sharing to an established norm. There are no easy answers, and no “silver bullet”, but there is much we can act on now.
This post originally appeared as part of Digital Science’s “The State of Open Data Report 2018”, and is published under a CC BY 4.0 license. The full report can be found on Figshare: https://doi.org/10.6084/m9.figshare.7195058.v1