What was achieved and what we learned at our research data hack day

Dec 08, 2017
More than 40 researchers, software developers and publishing professionals collaborated at the latest Springer Nature hack day to create and design tools to help researchers:

  • find data and software that support publications
  • help identify the best data repositories
  • understand trends in global knowledge transfer

Of nine teams that formed during the day, four put themselves forward to be the winning project. The winning team created a working demonstration of a browser plug-in - “D’oi” - that alerts readers of a journal article if datasets cited in the article have been updated. In keeping with the themes of the day, this kind of tool aims to promote reproducibility and increase the visibility and reuse of datasets. We already have tools, such as Altmetric and CrossMark, that tell us about updates to, and activity around, journal articles so, if we’re to treat data as first-class objects in scholarly communication, we should have equivalent systems for data.

To produce a functional prototype for D’oi the team focused on two articles published in Scientific Data that have supporting datasets in the repository Figshare. While dataset updates only occur in a fraction of published datasets, to create the plug-in the team first needed to programmatically identify the existence of data links/citations in an article, and enable a particular repository to be identified - both criteria for linking scholarly articles to their supporting datasets.

“D’oi” browser plug-in, in operation on a Scientific Data articledisplays clickable links to an article's datasets and indicates (in orange) when datasets have been updated.

The Springer Nature SciGraph was hacked by the second place team - which included programmers from the Knowledge Media Institute (KMI), who are interested in the flow of knowledge across continents. This team focused on the metadata of publications in conference proceedings and created dashboards visualising these metadata. With more time they anticipate being able to identify predatory publishers with their project, Venue-centric trends, and explore other more complex questions. Read KMI’s blog about the day here and access their code here.

Representatives of DataCite, University of Oxford and Springer Nature addressed a problem that many researchers face - not knowing where to deposit their data. Their “ideas hack” defined the requirements and features for a data repository selection tool. Follow-up meetings are already scheduled with team members and other collaborators, such as FAIRSharing.org, to continue developing this concept.

Scientists from the Alan Turing Institute used the Springerlink fulltext API - made available without a subscription for the event - and discovered software citation in Springerlink content has grown substantially in the last five years. They charted references in full text to github (the most popular code repository used by many researchers) in Springerlink content.

Graph showing mentions of github in articles published on Springerlink

We also documented other ideas we didn’t have time to work on, and heard presentations from attendees who explored ideas and hacks that were less fruitful - highlighting the importance of sharing “failures” (or negative/null results) as much as successes.

In line with our code of conduct for the day, outputs from the day’s projects are publicly available (follow links above), which should stimulate further discussion and collaboration around these ideas, and the themes of the event.

One of our goals was to create an open and collaborative environment which, according to our attendee survey, was achieved; with everyone who completed our post-event survey (response rate 50%) saying they would recommend the event to a colleague. We also learned where we can do better in future events. For example, we may be able to enable creation of more complete project outputs by narrowing the scope of the event and focusing on fewer content and data sources. We’re already gathering ideas for future event themes, with text and data mining, natural language processing, and discipline specific research problems all being considered.

Big thanks go to all the attendees for making the event so productive, and at Springer Nature we’re looking forward to exploring the day’s hacks further - openly, with the community.

Iain Hrynaszkiewicz

Head of Data Publishing, Springer Nature

Iain Hrynaszkiewicz is Head of Data Publishing at Springer Nature where his team develop research data policies and services across Springer Nature. He is also publisher of Nature Research Group’s Scientific Data journal. He leads and contributes to community initiatives and working groups on data sharing and reproducible research and is Programme Chair of the conference, Better Science through Better Data. He has published numerous articles related to data sharing, open access, and open data - one of which has been cited more than 150 times.

