A synthesis of bacterial and archaeal phenotypic trait data

Open-access code that merges 26 data sources, reconciles conflicting data and condenses multiple records into a single record per species.

The field of “trait ecology” has emerged over the past two decades. It grew mainly out of plant ecology, beginning from an interest in understanding the spread of ecological strategies across species. Up to the 1990s, discussion of ecological strategies revolved around concepts such as stress-tolerance and ability to compete. But such concepts proved hard to define and therefore also to measure. For example, there was not a way to ask whether plant species in Cape Province tended to be more stress-tolerant than in northern England, in absence of an agreed method for measuring stress tolerance. Trait ecology solved this by positioning species along measurable axes such as seed mass, leaf mass per area (LMA) and potential height; measures that can be made on any species at any location, enabling direct comparison of species via common measures. The data synthesis described in our paper is a starting point for exploring trait axes in bacteria and archaea.

Global comparisons via quantitative traits have been notably productive. Major dimensions of variation have been characterized, and model species positioned against the constellation of variation (Díaz et al. 2016McWilliam et al. 2018). Traits have been used as predictors for decomposition rates (Cornwell et al. 2008) and for growth and response to competition (Gibert et al. 2016Kunstler et al. 2016). The communal TRY plant trait database (www.try-db.org) has found use in 291 publications so far, many from large collaborations with 10 or more authors from multiple research groups. An Open Traits Network (opentraits.org) aims to broaden this collaborative research style across all taxa (Gallagher et al. 2020). 

Bacteria and archaea are different from plants in many ways, but they do have this in common: that ecological strategies are largely discussed by reference to more or less abstract concepts such as the oligotrophy-copiotrophy spectrum; concepts that are hard to define and measure and thus presenting a similar problem to that of plant ecology in the 1990s. Accordingly we initiated a part-time project within the framework of a small Macquarie University collaborative network called the Species Spectrum Research Centre. The project aimed to gather as much trait information on bacteria and archaea as we could find and consolidate this information into a dataframe that could easily be probed to explore different ecological questions. 

Workshop at Macquarie University
Workshop at Macquarie University, November 2018. From left: Jennifer Martiny, Sasha Tetu, Phil Hugenholtz, Josh Madin, Frank the Bear, Daniel A Nielsen, TBK Reddy, Michael Gillings, Jemma Geoghegan, Mark Westoby and Andrew Bissett.

The core group consisted of Westoby (background in plant trait ecology), Gillings, Moore, Paulsen, Tetu and Nielsen (all microbiologists), and Madin who is a data scientist and coral biologist. The group met weekly over several years, progressively discovering data sources and discussing issues, papers and preliminary analyses. Due to the many decisions that had to be made in order to combine and process the data, it was decided early that data synthesis should take the form of code that was open access. Data was imported and stored as it came from original sources, and the various decisions made subsequently are all recorded in the code. These include reconciling units, merging columns that described the same trait but with different words, correcting or removing observations that were assessed to be errors for one reason or another, and condensing multiple records into a single summary record per species. Because the code is accessible, users can add in further data sources or modify any of these decisions as they see fit, and also add new trait information as it becomes available. 

To date, the data frame contains mostly phenotypic traits such as cell diameter and length, maximum growth rate, oxygen use, gram stain, growth temperature etc., but also some basic genome related traits such as genome size, number of coding genes and gc content (Madin et al. 2020). In total, the data frame covers 23 traits for over 15,000 unique species, although most species do not have a record for all of the traits. We decided not to include traits that were deduced through genome analyses, since this is a fast-developing field and annotations are continuing to improve rapidly. 

Our group’s aim was not only to build a dataset of species traits, but to answer a variety of ecological questions from it. Those questions will be addressed in separate papers, we hope. But we think other research groups will also find this data merger useful. Ideally it may continue to develop as a community resource.

Please sign in or register for FREE

If you are a registered user on Research Data at Springer Nature, please sign in