How should Findable, Accessible, Interoperable and Reusable (FAIR) data work in practice?

This post was my winning entry to the Better Science Through Better Data writing competition in 2018.

Feb 06, 2019
0
0

On a cool spring day, you can take a stroll through what is “probably the most researched area of woodland in the world” - Wytham Woods. The sun may have a hard time piercing through the canopy of ancient trees, but new artificial wavelengths - wireless data - are flowing out of it in increasingly larger quantities to study a variety of plant and animal species. Researcher’s with Oxford’s Department of Zoology are monitoring individual mice and voles to capture behavioral and evolutionary responses to environmental change. With the spectre of climate change looming on the horizon, the results of this study and others ongoing at field sites around the world become more pressing. It remains an urgent question whether ecosystems and the species that inhabit them can adapt in time.

Recent observations (Hallmann et al. 2017) of large declines in insect populations in western Germany prompted headlines around the world of “ecological Armageddon”, showing that species monitoring data is not mundane science but a potentially crucial early warning system for how local ecosystems will respond to climate change. With hundreds of scientists around the world collecting data on local populations, the principles behind FAIR data are necessary to enable planetary scale monitoring. Findable, Accessible, Interoperable, Reusable (FAIR) data is already being implemented by biodiversity databases such as the Global Biodiversity Information Facility (GBIF). GBIF, which allows for museum records to appear alongside smartphone photos shared by citizen scientists, demonstrates the benefits of FAIR data in practice, but the principles can be extended even further.

In practice, FAIR data should integrate additional sources of open data to provide advance warning, rather than reactive studies. For example, this year scientists reported a significant decline in the world’s largest penguin colony at Ile aux Cochons after a 30 year gap between studies (Weimerskirch et al. 2018). However, the data used in the new study (satellite images) have been openly available for some time before the decline was spotted. This shows how FAIR data can continue to improve.

Monitoring species can be labour intensive, time consuming, and costly. FAIR data offers multiple practical avenues to overcome these challenges, such as increasing the interoperability of new sources of data such as mobile citizen science applications and satellite imagery with existing scientific monitoring. In their paper in Nature Scientific Data, Wilkinson et al. (2016) argue that “humans increasingly rely on computational agents to undertake discovery and integration tasks on their behalf”. While global databases such as GBIF continue to expand, it is crucial that computational agents are developed to take advantage of data that already exists in satellite image archives and museum catalogues. If Google Maps can count new houses in real time from satellites, scientists can develop a platform for counting penguins. If algorithms analyze these additional data streams, the picture may become clear enough for scientists to have advance warning of how species are responding to environmental change, and not be caught by surprise.

This post was my winning entry to the Better Science Through Better Data writing competition in 2018. Find out more here, and read my conference report here.

Jory Fleming

Graduate Researcher, University of Oxford

No comments yet.