Love (for Data Sharing) in the Time of COVID
As the founder of the Research Symbiont Awards recognizing excellence in data sharing and as a physician-scientist, I have watched with optimism as data regarding COVID-19 has been made available in so many instances.
In December 2019, an emerging infectious disease began to spread around the world. As I write this in early March 2020, the WHO counts 59 countries with cases of this disease, now called Corona Virus Disease 2019 (COVID-19). Many deaths have occurred.
From the early days of the epidemic, there has been tremendous energy around sharing early and often.
For example, preprint servers have been used to share COVID-19-related manuscripts before the often-lengthy process of peer review. A search of the medRxiv preprint server for “corona virus” identifies 32 results, nearly all of which appear to focus on COVID-19. The bioRxiv preprint server hosts additional manuscripts dealing with COVID-19. Wellcome and Springer-Nature have quickly created a platform to help the research community identify high-priority preprints and review them.
Many journals have made articles about COVID-19 freely available. The rationale for the rapid, free availability of these articles based on the urgency and seriousness of the situation is self-evident. New insights into the prevention or treatment of COVID-19 could save a large number of lives.
Not only is the sharing of new breakthroughs essential, but also the sharing of information about what has not worked well. Physician William Withering said eloquently in his 1785 treatise on the use of foxglove as a medication:
“As no pains were spared to prevent the return of the dropsy*, and as the best means I could devise proved unequal to my wishes, both in this and in some other cases, I shall take the liberty to point out the methods I tried at different times in as concise a manner as possible, for the knowledge of what will not do, may sometimes assist us to discover what will.”
[emphasis added; *an archaic term for swelling often caused by heart failure]
Whether articles about other medical and scientific problems should be made available rapidly and without cost to readers is deserving of consideration, as well. After all, the urgency of scientific research is high for patients who have a potentially fatal malignancy or other potentially catastrophic health problems.
Beyond the issue of sharing articles rapidly, sharing COVID-19 data has been largely a success story, with a great deal of sharing happening in rapid order. A caveat is that the CDC recently stopped reporting the number of COVID-19 tests performed in the US, leading to criticism of the adequacy of their data sharing.
As the founder of the Research Symbiont Awards recognizing excellence in data sharing and as a physician-scientist, I have watched with optimism as data regarding COVID-19 has been made available in so many instances. At the same time, I remain concerned that data sharing in other situations is more limited than it should be.
It is my hope that the events of this epidemic are seen not as a one-off exception to business as usual, but rather as an example of a better way of sharing information, with urgency aimed at accelerating research progress. Is it not the case that a sense of urgency in sharing what we know would be to the public benefit in other areas of medical research, and science more generally? As an example, patients with cancer need information to be shared with urgency because their timelines for decision-making may be as short as the timelines faced by public health officials in an epidemic.
The purpose of our Research Symbiont Awards is to encourage a shift in the culture of science toward increased, earlier, and more effective data sharing.
The rest of this blog post highlights the 2020 winners of the Research Symbiont Awards and their contributions that were recognized at this year’s Pacific Symposium on Biocomputing.
The 2020 Research Symbiont Winners
2020 Early Career Research Symbiont Award
Alex Lenail was the 2020 winner of the Early Career Research Symbiont Award, which recognized his work in sharing a project related to amyotrophic lateral sclerosis (ALS). At the time of his nomination, Alex was a PhD student at MIT. Alex built a data portal to share data from 1000 ALS patients. He collected, identically pre-processed, and systematically harmonized approximately 400TB of diverse biomolecular data, and the resulting data portal is available publicly: http://data.answerals.org/
Alex’s perspective on data sharing and the Research Symbiont Awards:
As someone who started my career in software engineering, the way software companies are arranged has been my default, and the ways academia is organized has always struck me as peculiar in contrast, which has led me to question some of its assumptions.
Play a thought experiment with me: imagine biomedical researchers worked in one large organization, committing all code and data to a single shared repository, which anyone could build off of. Imagine if peer reviews were openly available upon publication in accompaniment with the manuscripts’ results. Imagine we never “reinvented the wheel”, because published code and protocols were standardized and easily replicable. What I’m describing seems unbelievable for academia, but it’s nothing out of the ordinary for large software organizations.
Now, I think we believe there are benefits to academic research proceeding somewhat “chaotically” which outweigh some of the benefits of a more managed system. But I don’t think there are no lessons to learn from the way software companies are organized. And I think the topic of whether academic research has converged on the optimal way working together to make important discoveries should be a question we pose ourselves.
One way I think we could unlock more potential would be to reduce the friction inherent in all aspects of working with data in academia today. Getting access, downloading, harmonizing, pre-processing -- these all add time between asking a question and getting an answer. We have come to expect immediate answers to many of our questions, so it feels like we should be able to just “look up” whether a dataset supports a hypothesis or not, or the code someone ran to make some plot in an article.
Although that’s a long ways from where we are today, I think we should aspire to move in that direction. Were we to live in a world where you could “just look up” scientific data, I think we would collectively ask better scientific questions, and spend more time doing original thinking.
2020 General Symbiosis Award
Brian M. Bot was the 2020 winner of the General Symbiosis Award, which goes to a more established researcher. Unfortunately owing to timing issues he was not available for comment. Brian is the curator of the mPower Public Researcher Portal, created by Sage Bionetworks. The data in this portal come from one of the first large-scale attempts to assess the feasibility of quantifying Parkinson disease symptoms and their changes in a ‘real world setting.’ The researchers made the first six months of data made available quickly, and in fact years before the manuscript analyzing these data was submitted for publication.
Beyond these facets of the work, the data were collected with an informed consent process that allowed participants the choice to determine whether their data was (1) shared only with the study team; or (2) shared broadly with qualified researchers worldwide. At the time of Brian’s nomination, 229 researchers had gone through Sage’s qualified researcher process, gaining access to these data.