Summarization is a communication strategy we use every day. We routinely take what we know and distill it for others. Regardless of profession, everyone uses and relies on summarization. This is particularly true when it comes to health information. People frequently consume information about their health, and they use that information to help guide the decisions they make about exercise, nutrition, medical care, medical treatments, and many other areas. In the hospital or clinic, a medical professional is on hand to provide information and answer a patient’s questions. But at home, many people readily turn to the internet, where there are large quantities of unreliable or confusing information. Given that it can be very difficult to find accurate and reliable health information online, the question we are most concerned with is how can we make it easier for people without extensive medical knowledge to answer their questions? This is where summarization comes in. In fact, a summary of a long webpage or technical source of information may be one of the best ways to help people find the health information they are searching for.
If summarization of health information is so useful, how can we make summaries available to every user who wants one? It’s impossible to have humans manually write a summary in response to every search query, and fortunately, we may not need an army of summarizers madly typing away to accomplish this. Deep learning algorithms have made many advancements in the past few years, including in the field of text generation and automatic summarization. It may be surprising to learn that neural networks are capable of summarizing scientific or medical text, but, depending on the architecture of the model, these algorithms have the capability to do exactly that. We wanted to take advantage of these recent developments in deep learning approaches, for the purpose of automatically creating summaries for consumers of online health information. However, to evaluate any deep learning algorithm, data is required. This is why we created our MEDIQA-AnS collection. It consists of questions asked by users of the Consumer Health Information Question Answering (CHiQA) system, web pages from reliable sources, and manually written summaries of these web pages. Having this data available means that we can tell an algorithm to summarize the documents in the MEDIQA-AnS collection, and then compare the summaries it generates to the ones written by humans, using automatic metrics to rate its performance. In doing so, we can develop a reasonable idea of the ability of any algorithm to summarize health information in response to a user’s question. We can’t implement any old summarization algorithm in an artificially intelligent medical question answering system without risking that algorithm producing unreliable, incorrect, or just plain nonsensical summaries. Essentially, knowing how well an algorithm does on a specific summarization task is an important step towards generating summaries that real users can use to answer their health questions!
As an example of an actual use-case of MEDIQA-AnS, we were recently able to use the data to validate the performance of a new summarization algorithm, the Bidirectional Auto-Regressive Transformer (BART). The results from this evaluation indicated that we would be to use BART in a challenge organized by the Text Analysis Conference (TAC) at the National Institute of Standards and Technology. This year, the TAC challenge is to answer ad-hoc questions about COVID-19 using online health information as the source material. Because we knew (from our analysis with MEDIQA-AnS) that the summaries produced by BART were often relevant to health questions, we were able to submit summaries generated by BART as the answers to the questions about COVID-19. Without the MEDIQA-AnS dataset, we would not have been able to analyze the automatically generated summaries, and would not have been able rely on the summaries BART produced.
The dataset can be accessed at this Open Science Framework repository.
The paper is published here.