Earlier this year I was honoured to be invited to join the Committee on Data (CODATA) Data Policy Committee and to represent Springer Nature at a ‘High-Level Workshop on Implementing Open Research Data Policy and Practice in China’, which took place 17-18th September 2019. The workshop was followed by CODATA’s annual international research data conference, CODATA 2019 Beijing 'Towards next-generation data-driven science: policies, practices and platforms'.
Having never visited China before, I was looking forward to experiencing for myself some of the unique culture of China; not only the sights and sounds, but also to learn more about the Chinese approach to research and data. Over the last 5-6 years there has been a major cultural shift in attitudes to research data in many parts of the world, as evidenced by multiple surveys of researcher attitudes to data sharing (here, here and here) and the ever increasing number of data journals and data papers. I was eager to learn what this cultural shift looks like for China-based researchers.
Infrastructure for research data
Being able to manage and preserve research data relies on the existence of reliable infrastructure, especially data repositories. Professor Guo Huadong (Chinese Academy of Sciences) is leading the effort to develop 20 national data centres, covering all types of research data. These 20 national data centres are planned to feed into an overarching cloud infrastructure called CSTCloud (similar to the European Commission’s vision for the Europe Open Science Cloud (EOSC) for Europe).
The CSTCloud project is being led by Professor Li Jianhui (Computer Network Information Center, Chinese Academy of Sciences). Professor Li along with CODATA President, Professor Barend Mons (Leiden University), and others expressed their vision that the CSTCloud may one day be interconnected with the EOSC, and eventually to other regional cloud infrastructures as they are developed (for Africa, Latin America and South-East Asia), creating a truly global network for research data. Although we are a long way off from the global research cloud, I found it heartening to see so many in-depth conversations occurring in the form of presentations, panels, formal discussions and informal chats across the week, each taking a distinct piece of the puzzle (e.g. technical infrastructure, metadata standards, cultural change) required to make this remarkable vision a reality.
Whilst development of the CSTCloud and the 20 national data centres is ongoing, there are many Chinese repositories with a more focused scope making good progress in key aspects of open data. A notable example is the Fudan University Social Science Data Repository. The Fudan University repository is built on the Dataverse platform, and uses Dataverse’s Data Tags feature to colour code the sensitivity of their datasets. This helps data users to more easily distinguish data which may be sensitive, and thus should be used with careful consideration of possible harm, from those data which may be used more freely.
It is clear that libraries have a very important role for enabling and embedding good RDM practice in China. Over the course of several CODATA 2019 presentations, Dr Haiyua Cui and her colleagues at Peking University (PKU) library provided several good examples of the kinds of things research libraries can do in this arena. A notable initiative is the national data–driven research contests held annually since 2017. These were run to raise awareness of PKU’s institutional data repository and RDM in general amongst the social scientists who are the repository’s core user group. As a result of the 2018 contest, PKU library recorded a 4-fold increase in data repository user accounts from 2,000 to 8,000, and found that 87.5% of the participants could share their research data.
Publish or Perish?
Taking the time to prepare data for sharing is often not very high on the list of priorities for a researcher. However, as demonstrated by the PKU library, given the right kind of support, researchers are willing to share their data.
As we are increasingly seeing in all parts of the world, it was also clear in China that peer-reviewed data publications are valued as signifying formal credit for the time and effort spent preparing research data for reuse by others. To support researchers concerns regarding career progression, research agencies should consider updating promotion and tenure evaluation policies to take account of the impact of all research outputs, and to let researchers know that data publishing is supported.
Several of the surveys presented at CODATA2019, demonstrated that Chinese researchers often express a willingness to share data only under specific conditions. Co-authorship is a data sharing condition that came up several times, and is understandable when considering the pressure to publish. Whether application of a co-authorship condition can truly be considered as ‘open data sharing’ is debatable. However it must be noted that there are instances of co-authorship as a condition of data sharing by non-Chinese researchers, for example the Alzheimer’s Disease Neuroimaging Initiative. How conditional data sharing might be resolved for Chinese research remains unclear.
Planning to manage and archive research data
When Springer Nature surveyed Chinese researchers on their attitudes to sharing data, the results showed that 93% Chinese researchers have completed a Data Management Plan (DMP) in the last two years, compared to 70% worldwide. The Chinese State Council’s ‘Scientific Data Administrative Measures’ (the Measures), published in May 2018, make it mandatory to complete a DMP.
After an intense week of workshops, presentations, panels, formal discussions and informal chats, it is clear that there are key questions still to be answered with regard to research data in China. What it means to be FAIR, and what open data in China look like are two such unanswered questions. However it is significant and positive that Chinese research agencies are engaging with the international data stewardship community through events such as CODATA. In addition, the announcement of policies together with clear guidance and resources for implementation means that policies on research data are able to quickly make an impact on the way in which research in China is carried out.