Better Research Through Better Data: Q&A with David Carr, Wellcome Trust

We put your questions about research data to David Carr, Programme Manager – Open Research, Wellcome.

These questions were asked during Better Research Through Better Data Live. Catch up on the recording here. The presentation slides are available here.

Q: What is your take on ethics in data?

David Carr: We fully support the European Commission’s concept in relation to data sharing of “as open as possible, as closed as necessary”. For research involving human participants, the privacy and confidentiality of research participants must be safeguarded. However, this must be balanced against the need to facilitate safe and secure re-use of these data in ways that will benefit health and society. In lots of cases, it is possible to establish mechanisms to enable this– for example through anonymisation and managed access provisions.

Q: Can I share my unpublished data? Will it be valid in future for publications? Will it remain my own Research Data?

David Carr: The short answer is yes you can – we are keen to encourage researchers to share data that has not been published where this could add value – including null and negative findings. The answer to the second part perhaps depends on the context – if the question is targeted at sharing data ahead of publication, perhaps associated with a pre-print then many major publishers are now clear that sharing findings in this way doesn’t pre-empt publication.  In some disciplines there are established practices for sharing data ahead of publication that protect the data generators right to first publication their primary analysis. In general, there is an expectation that data users should cite the source of the data appropriately when reporting any findings generated from the re-use of the data – recognising the work of the data generator. Making data available through a repository with a persistent identifier enables other researchers to ready cite the data.

Q: What are the benefits of sharing research data?

David Carr: I covered this at a high-level in the talk and could write at length on this one! Very briefly though it can help maximise the value of the research – opening up the potential for other researchers to use the data to gain new insights that may advance discovery and its application for societal benefit. In addition, access to data and software underlying published findings is vital to enable those findings to be properly scrutinised and hence for ensuring trustworthiness more broadly. Finally, there is a strong argument that openness can enhance the efficiency of the research enterprise – potentially reducing duplication and accelerating progress.

Q: Why can authors not get rewards for their research?

David Carr: As I mentioned in my talk, research assessment practices in academia have traditionally placed a high emphasis on publications as the primary currency, with venue of publication being used as a proxy to judge the quality of the research. There is a widespread recognition that this needs to change, through initiatives such as the San Francisco Declaration on Research Assessment (DORA). Wellcome, together with other signatories to DORA, supports the principle that it is the intrinsic merit of the work that should be important – not where it is published.  We also think that other research outputs and contributions (including sharing of data and code, but also other valued contributions such as mentorship and influencing policy and practice) need to be taken into account. We believe this is a key part of embedding a fairer and more inclusive research culture.

Q: For an average research project, what percentage of funds do you expect would be dedicated to implementing the OMP?

David Carr: I have been asked this several times and always resist being drawn on a figure! My view is it could be highly variable from a couple of percent up to a significant proportion, and encourage researchers to put in the costs they need to maximise the value of their outputs rather than make it fit an arbitrary percentage.  I think others have suggested it might average around 5 per cent – but sorry I don’t have a reference to hand.

Q: Open data and open science do not seem to be easily and fully compatible with the Nagoya Protocol on Access and Benefit Sharing under the Convention on Biological Diversity. What is your/Wellcome's view on the claims to expand the scope of Access and Benefit Sharing obligations to "Digital Sequence Information"? And how would this affect open data/open science?

David Carr: It is absolutely vital that concerns over equity and benefit sharing are factored into discussions over data sharing. In general, for genomic sequence data, I’d argue that the global benefit will be maximised if the data is shared openly and without restrictions. However, this does not negate the need to ensure that appropriate mechanisms and frameworks are set in place to ensure that the benefits are shared in a fair and equitable way.

Q: Can you tell us who is part of the consortium for FAIRware? Is NIHR part of this consortium?

David Carr: Yes NIHR is one of five partner funders supporting this project as part of the Research on Research Institute. The others are the Austrian Science Fund (FWF), the Canadian Institutes of Health Research (CIHR), the Swiss National Science Foundation (SNSF) and the Wellcome Trust.

Q: Where can I find more information about FAIRware?

David Carr: There is more information about FAIRware on the Research on Research Institute website,

Q: Many researchers in LMICs feel that they are unequal partners in data sharing if they don't have the same capacity to manage data and analyse it i.e. it is UNFAIR.  What plans do Wellcome have to build capacity in LMICs to curate and manage data as well as analysing these data?

David Carr: This is a concern we are acutely aware of and have worked on for some time. In brief, and as noted above, it is vital that equity is at the heart of the approach for data sharing.  As a funder, Wellcome has been committed for many decades to building research capacity sustainably in LMICs. In addition, through initiatives such as H3Africa, we have worked with communities in LMICs to develop models to make data available in a way that does not disadvantage LMIC researchers increases the overall capacity for data analysis.

Q: Does Wellcome publish case studies that showcase the value of the research enabled with shared data?

David Carr: We have published various case studies - for example, a detailed report into the Worldwide Antimalarial Resistance Network (WWARN) as a pioneering data platform in 2016.  We have also run data re-use prizes to specifically celebrate projects that re-use existing data. We don’t currently have a centralised resource of case studies and there is a wider need for more better exemplars.

Q: I have completed a research project relating to the COVID-19 lockdown. How can I participate in using this data to publish and share?

David Carr: This sounds an interesting study and in line with other COVID-19 research we’d urge you to share the findings and data and rapidly and openly as you can – including through considering publishing a pre-print and making the data available – ideally via a recognised community repository if possible (and in a manner which is appropriately anonymised and in line with relevant consents etc).

Q: Much of the COVID-19 data will be individual patient data (IPD). This can't be published but does need to be shared - how should this be approached to ensure the ethical and effective sharing of these data?

David Carr: As noted above, these data need to be shared in a way that safeguards privacy and confidentiality of the individuals – well-established mechanisms exist that can help ensure that where possible, these data are made available – appropriately deidentified and with appropriate controls on data access.  Initiatives such as the International COVID-19 Data Alliance are aiming to allow such data to be pooled and re-used in a safe and secure manner for global benefit.

Q: Would it be beneficial to develop a common shared research agenda that outlines which questions the shared COVID-19 data should be expected to answer? This would prioritize what needs to be shared and how.

David Carr: I think this would have value in terms of prioritisation, as long as it doesn’t restrict the potential for the data to also be re-used more widely.

Please note that some questions have been edited for clarity.

