“Data is the new black” and other highlights from Better Science Through Better Data

As a Better Science Through Better Data 2018 writing competition winner, I was invited to share my thoughts from #SciData18.
“Data is the new black” and other highlights from Better Science Through Better Data

As a Better Science Through Better Data 2018 writing competition winner, I was invited to share my thoughts from #SciData18.

“Data is the new black” read the top-voted response to the “data is ____” poll posed to the audience of this year’s “Better science through better data” event. And it has certainly proven to be all the rage, with the conference featuring representatives from geography, biomedicine, demography and beyond. 

A series of ‘lightning' talks provided a tour of data-centred research efforts in various fields, giving an insight into the growing role of data in scientific research.

Highlights included an account from a team investigating UK coastal erosion - a study which would traditionally have involved a labour intensive data collection process - who have created a citizen science program where volunteers can contribute photos of coastlines taken on their phones. 

Even more ambitiously, the WorldPop project is building an open archive of global population information. Their talk demonstrated how they use novel sources, such as anonymous data from mobile phone companies, to construct a high resolution picture of the population in rural and low income areas where such information is often limited. Their datasets have aided organisations including the UN in research on epidemics, disasters and more.

In the field of medicine, two teams who are training machine learning algorithms on brain images - one to diagnose epilepsy and the other strokes - explained how they are sharing both their data and analysis software with the public; one speaker found that more people have downloaded their data than have contacted them to request their code, suggesting that this approach is encouraging independent research.

Naturally, questions surrounding scientific data - including standards, ethics, and transparency - arose throughout the day, and it was interesting to hear how the speakers have dealt with these challenges. The WorldPop team has conducted ethics research and developed protocols to ensure their data is anonymous; even the diagnostics studies had to anonymise their brain images to ensure patients’ privacy. Keynote speeches from Rebecca Boyles[1], Marta Teperek[2], Magdalena Skipper[3] and John Burn-Murdoch[4], explored these data-related issues in a suitably meta way; recounting studies on optimal data management and data visualisation practices, and introducing roles such as “data generalist” (someone with a breadth of knowledge in statistics and computing, who liaises between experts, can identify the appropriate analysis tools for a given scenario, and understands the limitations of data) and “data steward” (who governs and improves data management practices within a research team).

The central question of the event was "is better data making science better?”, and the consensus appeared to be was that better data implies open data, because open data facilitates reproducibility (which is integral to good science). The conversation around this question culminated in a panel discussion on “The responsibility of reproducibility: whose job is it to change the status quo?”, where panel and audience debated the technical and idealogical barriers to open data, and proved that adopting open data policies is not going to be straightforward.

It became clear that scientists' concerns about open data differed depending on the scale and nature of their experiments: sharing data can be expensive and time consuming, especially for small research teams; some experiments produce complex datasets which require specialised software to be interpreted; data privacy is of particular importance in medical research. If open data is the way forward, a balance must be struck to ensure transparency without placing an unreasonable burden on researchers.

Skipper’s keynote speech featured a poll she put to researchers asking what motivates them to share their data, and “freedom of information” ranked last. Even as it becomes technically easier to do open science, a radical cultural change will be essential for the open data movement to continue to gain momentum. 

Though the impact of open data on research is hard to measure, the variety of open science endeavours shared at this event, and the level of interest they have received, demonstrates that (at least in the research community) it is regarded as a worthwhile mission, and one that researchers are willing to lead.


[1] Rebecca Boyles, Senior Manager, Bioinformatics and Data Science, RTI International

[2] Marta Teperek, Data Stewardship Coordinator, TU Delft

[3] Magdalena Skipper, Editor in Chief, Nature

[4] John Burn-Murdoch, Data Jounalist, Financial Times

Please sign in or register for FREE

If you are a registered user on Research Data at Springer Nature, please sign in