Earlier this month I attended the International Society of Biocurator’s annual conference, Biocuration2019. It was my second ISB conference, having attended the third ISB meeting back in 2009! Having been a curator for the Human Gene Nomenclature Committee (HGNC) and the Gene Ontology (GO) for several years, I have previously worked with many of the organizing committee members and several of the attendees, so this conference was something of a homecoming for me!
I started Biocuration2019 with a pre-conference workshop, ‘Mapping the landscape of biocuration - where are the biocurators and what do they need?’. This workshop had come about as a result of an ELIXIR Implementation study aiming to identify communities of biocurators and their training needs. The workshop organisers began by presenting an overview of the results of a pilot survey they have undertaken of the biocuration community. Key things I noted from the preliminary survey results:
- Inclusion of ‘curator’ in a job title is unusual; biocurators are given a wide variety of titles. This does make it difficult for the biocuration community to reach out to each other and find relevant positions. (At Springer Nature we use ‘Research Data Editor’ as the official title for our curators, as this title structure aligns more closely to existing career structures within the company.)
- 1:1 training and self-training were considered more impactful than formal training courses. In my view, this is likely due to the very specific nature of the job that each biocurator carries out, there are unlikely to be many other people (or in some cases no-one else) doing the same job. Training requirements are therefore going to be very specific for each role.
- The top 3 skills as identified by the biocurators completing this survey were: attention to detail, curiosity, and domain knowledge. With the exception of domain knowledge, these are ‘soft’ skills, and it remains an open question as to how ELIXIR can support the development of these skills in the next generation of biocurators.
At the time of writing the survey was still open for completion.
Each day of the conference began with an impactful and inspirational keynote. Sean O’Donoghue spoke about the impact of data visualization on interpretation of the data, and suggested D3.js for use by non-data visualization experts to help visualize their data. Paul Sternberg heads Wormbase and described his vision of ALL data (including non-confirmatory data) being captured in knowledgebases*. Wormbase relies on a team of curators to read the literature and capture pertinent findings in a machine accessible way, but Paul pointed out that this is made difficult by the literature becoming increasingly complex.
*It became clear to me at this meeting that as a community of research data professionals, we should take care to differentiate between resources which archive and store primary data, i.e. data repositories, and resources that present information which has been gathered (curated) from the literature either manually or in some automated way, i.e. knowledgebases.
The third keynote was given by Ellen O’Donagh and provided an insight into crowdsourcing expert feedback in her role as Head of Curation for Genomics England. Ellen shared her experience of the time consuming nature of curation, even when this is crowd sourced from experts. It takes time to chase people who are themselves particularly busy due to the nature of their jobs, in addition to the time take for the actual curation. The final keynote was given by Susanna –Assunta Sansone on the work of the Data Readiness Group she runs at the University of Oxford, with particular emphasis on the work of the ISA tools and FAIRsharing.org teams to enable FAIR data. A key take away I took from Susanna’s talk; "connecting standards is not a technical problem, it's a social engineering problem".
I was invited by Jane Lomax (SciBite) and Yasmin Alam Faraque (Eagle Genomics) to take part as a panellist for the Biocuration in Industry session they chaired. The panel Q&A was preceded by talks by biocurators from Nebion, Roche, Healx, Eagle Genomics and the Pistoia Alliance. It was interesting to hear how biocuration skills are increasingly required to clean and annotate data (both public data and in-house generated data) as part of pharmaceutical development, drug repurposing and for the creation of knowledge discovery platforms.
At Springer Nature we now have several years of experience of data curation being part of the publishing process. The data journal Scientific Data has had a curator embedded within the editorial team since it was launched and we have since expanded both the curation team and number of curation services we offer to researchers wishing to publish their research data. Across several days of the conference, my colleague Tristan Matthews and I presented on Springer Nature’s interventions to encourage researchers to publish FAIR data (slides), our experience of engaging researchers with the curation process (poster) and our findings regarding the value of the curation services we provide (posterand slides).
With sincere thanks to the organizers for a highly enjoyable and productive four days, I conclude with my key takeaways from the conference:
- Data resources can be thought of as knowledgebases (curated resources) and data repositories (location of primary research data).
- Provenance for data resources is important; we need to know where data / knowledge have originated.
- We are all responsible for our public databases. If you spot an error in a public resource contact the resource’s managers and get it corrected.
- Data curation skills are increasingly required in non-academic / commercial enterprise settings.