Metabarcoding is becoming an indispensable tool to infer and monitor biodiversity in a wide range of ecosystems. However, one of the biggest barriers in the uptake and applicability of this method is the creation or availability of reference sequence databases required for taxonomic inference. The choice of reference database can highly influence the ecological inferences from the metabarcode data. Authors can opt to custom-make their reference databases which requires time and expertise or they can use pre-existing reference databases which are sometimes not ideal for the study objectives.
Here we are pleased to present the MARES (MARine Eukaryote Species) metabarcoding workflow and reference databases for use in marine metabarcoding studies. MARES provides a transparent and reproducible pipeline to generate custom-made reference databases in a standardised manner suitable for metabarcoding studies. To demonstrate the MARES workflow, and to cater to our own research needs, we have created and curated a comprehensive sequence reference database for marine eukaryotes using sequences from both GenBank and BOLD unified under a consistent taxonomy.
Arguably, the marine biome presents the ultimate challenge for the application of metabarcoding, including the generation of appropriate sequence reference databases. More than 80% of the world’s biodiversity is found in the ocean, comprising approximately 60 of the 82 known eukaryote Phyla, and it is estimated that only 9-10% of the marine species have been described. The on-going debate regarding suitable barcode regions to capture this diversity, the continual generation of species-specific barcodes, and refinement of taxonomies, requires a means to continuously and easily update metabarcode sequence reference databases. In our research, characterising the biodiversity of a hyperdiverse marine community using metabarcoding, we required a sequence reference database to taxonomically identify and make biological sense of our barcode sequences.
In our community samples from kelp holdfasts, we had an estimated 18 Phyla with varying levels of taxonomic description, barcode sequence representation in repositories, and the possibility of several cryptic taxa. We quickly realised the importance of finding an appropriately curated reference database for our study system, and when we were not satisfied with existing options, we encountered the complexity in building a custom-made reference database. There are several existing scripts that help to retrieve available sequences for target taxa from sequence repositories and merge these in subsequent steps, but these were not immediately interoperable or accessible to a novice ‘metabarcoder’ with little technical understanding or the computer skills to combine the disparate steps. For example, one crucial intervening step we have optimised in MARES is the curation and normalisation of the reference sequences into a standardised taxonomy prior to use in taxonomic assignment software. Just as Victor created his Frankenstein, MARES contributes original scripts and intervening steps to provide an accessible and complete workflow by combining the many excellent scripts and routines already provided by the metabarcoding community. For those interested in metabarcoding marine eukaryotes using cytochrome oxidase 1, the MARES reference databases we have curated provide the most comprehensive and up-to-date resource for your research. For those working in other systems, or looking to custom-make their own sequence reference databases, we provide our workflow along with a step-by-step tutorial to help you generate it.
MARES means “seas” in Spanish, the big blue unknown ocean which surrounds us. Thanks to the extraordinary advances in DNA technologies we have an open door to discover the ocean’s vast diversity and to understand temporal and spatial change in these complex ecosystems. We invite others to use MARES and to contribute to updating and maintaining it as a useful resource that will aid metabarcoding studies in providing important knowledge gains.
Link to the published article: https://rdcu.be/b5o4i
If you have any questions about the MARES dataset or pipeline, please feel free to contact the authors:
Vanessa Arranz (@vanearranz2)
William S. Pearman (@WilliamPearman)
J. David Aguirre