#SciData19 Writing Competition: Winning Entry #4

We are proud to publish the fourth of this year's four winning entries for this years Better Science through Better Data writing competition - congratulations to Thu (Mi) Nguyen
#SciData19 Writing Competition: Winning Entry #4
Like

Question: How should Findable, Accessible, Interoperable and Reusable (FAIR) data work in practice?

Answer:

Thu Nguyen - University of Illinois

The concept of FAIR (findability, accessibility, interoperability, and reusability) data was first introduced in 2016 [1], but its popularity has been slow growing. The main challenges for FAIR data in practice, in my opinion, include the amount public knowledge on this topic and the idea of interoperability. Jon Brock in Nature Index, commented on the State of Open Data survey and that only about 19.2% of all researchers are familiar and about 30.7% have only heard of FAIR principles. As the idea of FAIR aims to incorporate and integrate data from different fields of research, the implementation process cannot be done efficiently by a small group of people. The more people are familiar with this idea, the better they are willing to try, and the more people can contribute their effort into implementing FAIR. Additionally, significant effort is needed in defining “interoperable.” 42% of responders from the same survey above said the I in FAIR is unclear to them. Does interoperable mean to be for all fields of research to be able to cross talk? How do we pick a common data type for all research fields? Just within the field of mass spectrometry, different vendors already have different data type and software, which most of the time do not crosstalk. I imagine it would be hard and is quite unrealistic for all scientists to come to consensus. Understanding each data type from different subfield can be a challenge in itself, which brings up a different-yet-related challenge of the implementation of FAIR faces. Kate LeMay, senior research data specialist at the Australian Research Data Commons field of science, commented through an interview with Nature Index that “different culture and requirement for data and metadata.” It was noted, on the other hand, by Wilkinson et. al. from the original article, that it is unsustainable to create a computer parser for all data types. Given how far the current situation is from the ideal machine interoperable future, uniting all different science fields, it almost sounds impossible. However, maybe more realistic for interoperability in each subfield of research. Given how many new data repositories are being created, I think many of them are close to achieve FAIR criteria. Take the field of mass spectrometry on natural products for example. An accessible, interoperable, and reusable database would be the Global Natural Products Social Networking (GNPS) which functions as an open-access tandem mass spectrometry data to natural product scientists. Information from various laboratories, various studies, and targeted subjects can all be incorporated for continuous identification of published compounds. Additionally, an entire study dataset can be published with the Mass Spectrometry Interactive Virtual Environment (MassIVE) data repository, which allows users to browse and reanalyze published datasets. It would be hard to change the language or data file of this whole field, so that it is compatible with data from microarray or NGS. In fact, I would argue that it is not useful. So instead of putting time into advocating for a common data file between all research, we can focus on bringing each one closer to the ideal FAIR repository in their own way. Then, related fields, can find a common way to communicate their finding. For instance, genomics and proteomics can communicate through gene ontology pathway analysis. In conclusion, to implement FAIR data, I would keep the discussion about these requirements at conferences, which would then create the chance for each research society to achieve their own FAIR repository, or come together to solve the problem with interoperable data. 

1. Landry, Jonathan J M et al. “The genomic and transcriptomic landscape of a HeLa cell line.” G3 (Bethesda, Md.) vol. 3,8 1213-24. 7 Aug. 2013, doi:10.1534/g3.113.005777 

2. Wilkinson, M.D., et al., The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 2016. 3: p. 160018.

Don't forget to register for the live stream of Better Science through Better Data 

Meet the other writing competition winners here.

Please sign in or register for FREE

If you are a registered user on Research Data at Springer Nature, please sign in