Competitive, crowd-sourced computational approaches to translational medicine and systems biology data

Reflections on the DREAM Challenges and EPIDEMIUM workshop, 19-20th April, Paris, France.

Like Comment

Organised by DREAM Challenges founder Gustavo Stolovitzky, Julio Saez-Rodriguez, Pablo Meyer, Elise Blaese and EPIDEMIUM organiser Olivier de Fresnoye, the DREAM Challenges and EPIDEMIUM workshop was organised as a satellite of the RECOMB 2018 conference.

DREAM Challenges and EPIDEMIUM organise computational biology data analysis competitions to encourage computer science and Machine Learning (ML) experts to work on the many ‘big’ datasets being produced by the bio-medical community to solve specific real world issues.  For example, a recent DREAM challenge was on polypharmacy, and participants were asked to predict compounds that bind multiple targets using data made available for this challenge, alongside other publicly available datasets.

Competitors are given access to a number of datasets with which to train their algorithms, with a distinct dataset (same attributes, different data) retained to test the performance of each team’s efforts.  Keeping the test dataset private was discussed as an important part of the way the challenges are run, with some evidence of previous participants ‘over fitting’ their algorithms to the test data when it was mistakenly made available in a prior challenge. The organizers run the challenges with the overarching goal of improving human health, but as there can be substantial financial rewards for winning teams, it was noted that not all participants have the same motivations.

In DREAM Challenges, the competitive phase is followed by a collaborative phase; allowing competitors to learn from each other, further refine their algorithms and (for the most engaged), gain the opportunity to co-author a paper on the Challenge. This is a key part of the ‘crowd-source’ element of the way these challenges are run, with the deliberate intention to remove bias as different teams’ algorithms are refined in a collaborative manner.  This collaborative phase also mirrors the way in which ML algorithms are developed, with a continuous cycle of improvement.

This year the workshop also included an interactive session on how these challenges are run, with the organizers genuinely interested in learning how they might improve the experience for challenge participants.

I was invited to present on the work of Scientific Data (my slides), as access to high quality data underpins the success (or otherwise) of computational biology efforts. Scientific Data’s emphasis on publishing technically sound research data, in a manner which aims to maximise data reuse, is greatly appreciated by this community. We have also published several Challenge datasets since the journal’s launch:

The workshop was a chance for winners and runners up of recent challenges to present their work to the community, to learn about the latest data types being generated by the biomedical research community and to discuss potential Challenge questions for the following year. I thank the organizers for inviting Scientific Data to take part, it was a truly fascinating and very enjoyable 2 days. 

Varsha Khodiyar, Ph.D

Data Curation Manager, Springer Nature

As part of the Research Data team working on research data publishing initiatives at Springer Nature, Varsha leads the curation team at Springer Nature, and contributes to the design, development and delivery of Springer Nature’s research data training workshops. She is also responsible for curating and maintaining the Scientific Data and Springer Nature recommended repository lists. Varsha is an Executive Advisor of, a member of CODATA’s International Data Policy committee, programme chair for the Better Research through Better Data conference series, and a co-author of the TRUST principles.