Artificial intelligence (AI) is getting more and more used in medicine. One of the key factors for successful implementation and development of new methods is open research data. In this context, we released one of the largest datasets for capsule endoscopy, which can be used to develop methods that can detect diseases in the gastrointestinal tract. We hope that this data will encourage other researchers to work on the important topic and foster open and reproducible science.
Visual inspection of the insides of the human body is called endoscopy and is a common diagnostic technique used in several medical specialties. When it comes to examining the inside of the gastrointestinal tract, one can access it from above, called esophagogastroduodenoscopy - for visualizing the esophagus, stomach, and the upper part of the small intestines called the duodenum. Another option is to begin the examination from the other end, performing a rectoscopy, sigmoidoscopy, or colonoscopy, depending on how far up you want to examine. However, there are still parts of the small intestines, particularly the jejunum and upper parts of the ileum, that are hard to reach. There are ways of reaching these parts of your bowel also, and the least invasive is by swallowing a small, pill-like video capsule endoscope (VCE). The capsule contains one or more small cameras, light source, and battery and moves passively through your GI tract after you have swallowed it while capturing video of your insides from your mouth to your anus. Hence, video footage produced by the capsule endoscopes lasts for hours, contains plenty of irrelevant content, and is time-consuming for doctors to assess. Thus, a perfect task for AI-based computer analysis.
Open access data from capsule endoscopy is sparse; not many large and open datasets exist. In this respect, our Kvasir-Capsule dataset is one of the largest image and video collections of capsule endoscopy footage available today. The data is collected during actual examinations at Bærum Hospital in Norway, and experienced GI endoscopists labelled parts of it. Kvasir-Capsule consists of 117 videos that can be used to extract more than 4,7 million image frames. We have labelled and medically verified 47,238 frames with a bounding box around findings representing 14 different classes. In addition to these labelled images, there are 4,694,266 unlabelled frames included in the dataset. They can be explored for unsupervised or self-supervised learning.
We believe that the Kvasir-Capsule dataset can play a valuable role in developing better algorithms to reach the true potential of VCE technology. AI technology has the potential to sort the video footage and present the doctor with the crucial information needed for the correct diagnosis. By streamlining the process of capsule endoscopy assessment the doctors will save time, making it more usable in everyday clinical life.
The Kvasir-Capsule dataset can be found at OSF: https://osf.io/dv2ag/
The paper describing the dataset can be found at