As a winner of the Better Science Through Better Data writing competition winner 2018, I attended the Better Science Through Better Data conference and this is my report from the day. My winning article can be found here.
The 5th Better Science Through Better Data conference took place at the Natural History Museum in London on 14th November. The main theme of the day was to promote open and reliable data and introduce the role of the data generalist, a profession revolving around sharing data.
Rebecca Boyles, a data scientist at the National Institute of Health Sciences, was the first keynote speaker. Her talk focussed on the radical transformation of biomedical research over the last 30 years. The first step towards this was the human genome project in 1990. The results of which were published in Nature in 2001. This was a wide scale project, involving collaborators from across the globe. Over the course of the project, researchers faced difficulties storing, integrating, analysing, comparing and sharing data. Researchers would have been grateful for a fast and reliable way of sharing data, like the internet!
The next significant genetic technology breakthrough was the development of high throughput screening techniques which were fully integrated in 1999. These methods produce ample amounts of data which need to be analysed. This demonstrated the need for an online platform for the sharing of data because the amount of analysis was restricted due to the difficulties of data sharing.
Boyles insists that data is the world’s most valuable resource. The research group IDC predicts that 163 zettabytes of data will be produced and stored each year by 2025. This is a 10-fold increase from the current data creation rate of 16.3 zettabytes. This data will need to be collected, stored and analysed but according to Boyle, the platforms available for sharing scientific data are underutilised. While Boyles estimates that half of all data will be in the cloud in the next five to ten years, measures need to be taken by the scientific community to ensure data is shared. Boyles is actively working towards making data openly available in order to enable others to analyse and combine data with their own work and further scientific discovery.
Boyles continued by discussing why some researchers are reluctant to openly share data and highlighted that her biggest barrier to data sharing was privacy concerns and legal issues surrounding the sharing of restricted data. She also accepts that researchers often find data management planning expensive and time consuming, and researchers are sometimes lacking the expertise required for effective data sharing.
In order to encourage the sharing of data across the scientific community, Boyles encourages the use of the online platform, Data Commons, created by Vivien Bonazzi, which claims to ‘foster the development of a digital ecosystem’ and allows multiple participants to connect and share data.
Careers are being created that focus entirely on data sharing and communication. An increasingly popular role in research, says Boyles, is that of the data generalist. A data generalist takes on all responsibility for the sharing of data and needs critical thinking skills to integrate, evaluate and communicate the benefits and drawbacks of providing open data. They also have a role in data analysis. The emergence of this role should encourage better sharing of data.
Looking to the future, Boyles predicts the formation of teams around the data commons. For example, researchers will meet through the online platform and form teams around a particular set of data. Data analysis will be performed as a service, and digital objects will be FAIR, meaning that data is Findable, Accessible, Interoperable and Reusable. Consequently, shared data will be more usable and easier to locate. Data generalists will be used to identify limitations to data and communicate the value to the scientific community. Most importantly, data will continue to be the most valuable resource.