Introducing the Global Lake area, Climate, and Population (GLCP) dataset

The Global Lake area, Climate, and Popualtion (GLCP) dataset is a collection of annual lake surface area for over 1.42 million lakes with co-located basin-level climate and population data from 1995 through 2015.

Like Comment
Read more

We are pleased to share with the larger research community the Global Lake area, Climate, and Population (GLCP) dataset: a collection of annual surface areas for over 1.42 million lakes with co-located basin-level climate and population data from 1995 through 2015. The dataset and its workflow are publicly available and the authors encourage future users to both use and expand upon the core dataset. We hope that in publishing our data and workflow we can fast track analyses for future users that might otherwise proceed slowly due to limitations in time and computing power needed to create a global dataset of this nature.

New data collection technologies and a movement towards data sharing have enabled large syntheses across many scientific disciplines. In the case of aquatic research, remote sensing has created unprecedented opportunities to track global and decadal changes in freshwater abundance. Combined with freely available datasets on global climate or human population, such data can provide meaningful context for observed environmental changes. Yet despite the availability and accessibility of these datasets, research is often hindered by the technical and computational hurdles required to integrate data sources into a single, analytically-friendly format. 

We encountered this first-hand when, in pursuit of a global, decadal-scale analysis of lake areas, we realized that the effort to create a dataset for such an analysis was itself a multi-year endeavor. We were fortunate that we had both the funded research time and access to high performance computing resources to create the dataset we needed. However, we recognize that not all researchers have this luxury. Therefore, we are proud to publicly share our new dataset, the Global Lake area, Climate, and Population (GLCP) dataset. The GLCP harmonizes global-scale aquatic, climate, and population data so that researchers with a broader range of access to computing resources and data wrangling capacity can leverage the rich data sources available to assess lake changes over the past 20 years. 

This data product stands on the shoulders of giants: the GLCP harmonizes existing global-scale datasets for: lake (HydroLAKES, Joint Research Centre (JRC) Global Surface Water Dataset) and lake basin (HydroBASINS), climate (Modern-Era Retrospective analysis for Research and Applications, MERRA-2), and population (Gridded Population of the World) variables. The resulting dataset is a compilation of surface area for over 1.42 million lakes and reservoirs from 1995 to 2015 with co-located basin-level temperature, precipitation, and population data. The GLCP is intended to incorporate FAIR (findable, accessible, interoperable, reusable) principles and retains original identifiers from HydroLAKES and HydroBASINS to streamline reintegration with parent and other related datasets. 

We believe that the dataset as well as the project’s workflow, script repositories, and documentation is broadly applicable to many environmental researchers: natural resource managers, local and regional agencies, non-profits, and even citizen-scientists. With respect to the dataset itself, we foresee a wide range of possible applications for the GLCP from local-to-global analyses. At the local scale, researchers can pair data collected in situ from their specific lake with the GLCP to see how permanent, seasonal, or total water surface area may relate to other environmental or ecological variables at that site. At larger scales, researchers can use the GLCP to investigate regional and global changes in water quantity, which has the potential to inform policies for human consumption, agriculture, and other aquatic-related issues. 

Along with lake, climate, and population variables, the full GLCP data package is available through the Environmental Data Initiative and contains all Google Earth Engine and R scripts used to create the GLCP. By providing scripts and example file architectures, we hope that future users will be empowered to utilize our workflow and expand the dataset’s contents with various remote sensing, climate, and socio-ecological variables. 

If you have any questions about the dataset, please feel free to email or tweet at the authors:

Michael F. Meyer (@mishafredmeyer)

Stephanie G. Labou (@stephlabou)

Alli N. Cramer (@AlliNCramer)

Matthew R. Brousil (@mrbrousil)

Bradley T. Luff

This project was funded in part by a NSF GRF to MFM (DGE-1347973). The poster photo is taken from Lake Baikal (Siberia) by Michael F. Meyer.

Michael Meyer

Ph.D. Candidate @ Washington State University studying ecological effects of pharmaceuticals in lakes. Tweets are my own. RT=/= Endorsement. he/him/his, Washington State University