Recent Springer Nature hack day

What was achieved and what we learned at our research data hack day

Like Comment

More than 40 researchers, software developers and publishing professionals collaborated at the latest Springer Nature hack day to create and design tools to help researchers:

  • find data and software that support publications
  • help identify the best data repositories
  • understand trends in global knowledge transfer

Of nine teams that formed during the day, four put themselves forward to be the winning project. The winning team created a working demonstration of a browser plug-in - “D’oi” - that alerts readers of a journal article if datasets cited in the article have been updated. In keeping with the themes of the day, this kind of tool aims to promote reproducibility and increase the visibility and reuse of datasets. We already have tools, such as Altmetric and CrossMark, that tell us about updates to, and activity around, journal articles so, if we’re to treat data as first-class objects in scholarly communication, we should have equivalent systems for data.

To produce a functional prototype for D’oi the team focused on two articles published in Scientific Data that have supporting datasets in the repository Figshare. While dataset updates only occur in a fraction of published datasets, to create the plug-in the team first needed to programmatically identify the existence of data links/citations in an article, and enable a particular repository to be identified - both criteria for linking scholarly articles to their supporting datasets.

“D’oi” browser plug-in, in operation on a Scientific Data articledisplays clickable links to an article's datasets and indicates (in orange) when datasets have been updated.

The Springer Nature SciGraph was hacked by the second place team - which included programmers from the Knowledge Media Institute (KMI), who are interested in the flow of knowledge across continents. This team focused on the metadata of publications in conference proceedings and created dashboards visualising these metadata. With more time they anticipate being able to identify predatory publishers with their project, Venue-centric trends, and explore other more complex questions. Read KMI’s blog about the day here and access their code here.

Representatives of DataCite, University of Oxford and Springer Nature addressed a problem that many researchers face - not knowing where to deposit their data. Their “ideas hack” defined the requirements and features for a data repository selection tool. Follow-up meetings are already scheduled with team members and other collaborators, such as, to continue developing this concept.

Scientists from the Alan Turing Institute used the Springerlink fulltext API - made available without a subscription for the event - and discovered software citation in Springerlink content has grown substantially in the last five years. They charted references in full text to github (the most popular code repository used by many researchers) in Springerlink content.

Graph showing mentions of github in articles published on Springerlink

We also documented other ideas we didn’t have time to work on, and heard presentations from attendees who explored ideas and hacks that were less fruitful - highlighting the importance of sharing “failures” (or negative/null results) as much as successes.

In line with our code of conduct for the day, outputs from the day’s projects are publicly available (follow links above), which should stimulate further discussion and collaboration around these ideas, and the themes of the event.

One of our goals was to create an open and collaborative environment which, according to our attendee survey, was achieved; with everyone who completed our post-event survey (response rate 50%) saying they would recommend the event to a colleague. We also learned where we can do better in future events. For example, we may be able to enable creation of more complete project outputs by narrowing the scope of the event and focusing on fewer content and data sources. We’re already gathering ideas for future event themes, with text and data mining, natural language processing, and discipline specific research problems all being considered.

Big thanks go to all the attendees for making the event so productive, and at Springer Nature we’re looking forward to exploring the day’s hacks further - openly, with the community.

Related links

Iain Hrynaszkiewicz

Publisher, Open Research, PLOS

Iain Hrynaszkiewicz is Publisher, Open Research at Public Library of Science (PLOS), where he leads the conceptualisation and development of new products and services that add value to the PLOS portfolio by supporting and enabling open science. Iain was previously Head of Data Publishing at Springer Nature where he developed and implemented research data policies and services, and was publisher of Nature Research Group’s Scientific Data journal. He has also been Outreach Director at Faculty of 1000 (F1000), and spent seven years at the first commercial open access publisher BioMed Central (BMC) in a variety of editorial, publishing and product/policy development roles. Iain is part of several research/publishing community projects related to data sharing and reproducible research. He founded and is co-chair of an Interest Group in the Research Data Alliance (RDA) that is setting standards for journal research data policy globally, and founder of the annual early-career researcher conference, Better Science through Better Data. He has published numerous papers related to data sharing, open access, and the role of publishers in reproducible research - one of which has been cited nearly 200 times.