As a Better Science Through Better Data 2018 writing competition winner, I was invited to share my thoughts from #SciData18.
“Data is the new black” read the top-voted response to the “data is ____” poll posed to the audience of this year’s “Better science through better data” event. And it has certainly proven to be all the rage, with the conference featuring representatives from geography, biomedicine, demography and beyond.
A series of ‘lightning' talks provided a tour of data-centred
research efforts in various fields, giving an insight into the growing role of
data in scientific research.
Highlights included an account from a team
investigating UK coastal erosion - a study which would traditionally have
involved a labour intensive data collection process - who have created a
citizen science program where volunteers can contribute photos of coastlines
taken on their phones.
Even more ambitiously, the WorldPop project is
building an open archive of global population information. Their
talk demonstrated how they use novel sources, such as anonymous data from mobile
phone companies, to construct a high resolution picture of the population
in rural and low income areas where such information is often limited. Their
datasets have aided organisations including the UN in research on epidemics,
disasters and more.
In the field of medicine, two teams who are
training machine learning algorithms on brain images - one to diagnose epilepsy
and the other strokes - explained how they are sharing both their data
and analysis software with the public; one speaker found that more people have
downloaded their data than have contacted them to request their code,
suggesting that this approach is encouraging independent research.
Naturally, questions surrounding scientific data - including
standards, ethics, and transparency - arose throughout the day, and it was
interesting to hear how the speakers have dealt with these challenges. The
WorldPop team has conducted ethics research and developed protocols to ensure
their data is anonymous; even the diagnostics studies had to anonymise their
brain images to ensure patients’ privacy. Keynote speeches from Rebecca
Boyles, Marta Teperek, Magdalena Skipper and John
Burn-Murdoch, explored these data-related issues in a
suitably meta way; recounting studies on optimal data management and data
visualisation practices, and introducing roles such as “data generalist” (someone
with a breadth of knowledge in statistics and computing, who liaises between
experts, can identify the appropriate analysis tools for a given scenario, and
understands the limitations of data) and “data steward” (who
governs and improves data management practices within a research team).
The central question of the event was "is better data
making science better?”, and the consensus appeared to be was that better
data implies open data, because open data facilitates reproducibility (which is
integral to good science). The conversation around this question culminated in
a panel discussion on “The responsibility of reproducibility: whose job is it
to change the status quo?”, where panel and audience debated the technical and
idealogical barriers to open data, and proved that adopting open data policies
is not going to be straightforward.
It became clear that scientists' concerns about open data
differed depending on the scale and nature of their experiments: sharing data
can be expensive and time consuming, especially for small research teams; some
experiments produce complex datasets which require specialised software to be
interpreted; data privacy is of particular importance in medical research. If
open data is the way forward, a balance must be struck to ensure transparency
without placing an unreasonable burden on researchers.
Skipper’s keynote speech featured a poll she put to researchers
asking what motivates them to share their data, and “freedom of information”
ranked last. Even as it becomes technically easier to do open science, a
radical cultural change will be essential for the open data
movement to continue to gain momentum.
Though the impact of open data on research is hard to measure,
the variety of open science endeavours shared at this event, and the level of
interest they have received, demonstrates that (at least in the research
community) it is regarded as a worthwhile mission, and one that researchers are
willing to lead.
 Rebecca Boyles, Senior Manager, Bioinformatics and Data Science, RTI International
 Marta Teperek, Data Stewardship Coordinator, TU Delft
 Magdalena Skipper, Editor in Chief, Nature
 John Burn-Murdoch, Data Jounalist, Financial Times