How will open data advance scientific discovery?
Open data is an answer to the challenges posed by the changing nature of scientific research, often uncovering solutions to scientific problems in unlikely places.. This article was my winning entry to the Better Science Through Better Data writing competition in 2018.
To this day, many of the most influential scientific discoveries are ones that were born of open data. Darwin spent years obsessively collecting specimens from around the globe, creating from this data his Theory of Evolution. Copernicus supported his controversial hypothesis that the earth orbits the sun with data taken from the sky, and freely shared with contemporaries. John Snow, armed only with a city map and healthy dose of skepticism, used local knowledge to identify contaminated water as the source of cholera outbreaks in London. While historically science has been the privilege of the wealthy, it was, in principle, open to anyone provided they could read, write and weren’t too preoccupied dying of the plague. But as scientific knowledge accumulated, the data needed to advance further became increasingly complex. Enter the particle accelerators, randomised controlled trials, and deep space telescopes.
Scientific data is now difficult and expensive to create, collect and analyse, and access rights are exclusive. Motivations for closed data range from sensible to suspect, but the dawn of the world wide web prompted some experiments to share their data with the public. The National Centre for Biotechnology Information (NCBI) were quick off the mark, releasing a database of DNA sequences in the ‘90s; since then, discoveries made using NCBI  and other public databases have set a strong precedent for open data.
Today, more data is collected than scientists can possibly analyse. Meanwhile, analytic tools are becoming more powerful (let a deep neural network loose on a complex dataset and you may discover facts that would elude a human scientist alone for a thousand years) and pervasive. Consequentially, lines between scientific disciplines are blurring. Open data brings together experts in different fields, often with surprising results. Last year, an open data challenge run by NASA - to design an algorithm that measures how dark matter distorts galaxy images - was won by a glaciology research student, whose solution beat NASA’s own algorithms . For those with a more casual interest, experiments lacking in manpower are ‘crowdsourcing’ observations, allowing amateurs to contribute to discoveries including, recently, an elusive ‘Tatooine’ planet .
What else might we stand to gain from open data? Well, data that led to the discovery of the Cosmic Microwave Background Radiation (CMBR) - probably the most significant cosmological discovery of all time  - fell into the hands of two radio-astronomers, when they detected a mysterious signal during an experiment. Not recognising the signal for what it was (the CMBR), they went to great lengths to find its source (even evicting a family of pigeons from the apparatus), until they were tipped-off by a third party. What other discoveries could be lurking in data possessed by unsuspecting scientists?
Open data is an answer to the challenges posed by the changing nature of scientific research, often uncovering solutions to scientific problems in unlikely places. We’re all natural data scientists, having evolved pattern-spotting abilities and inquisitive natures; open data will allow us to harness that power and channel it into science.