So2Sat POP - A Curated Benchmark Data Set for Population Estimation from Space on a Continental Scale


About the Article:

The recently published article, "So2Sat POP - A Curated Benchmark Data Set for Population Estimation from Space on a Continental Scale," provides a comprehensive data set for population estimation in 98 European cities. The cities cover 28 European Union (EU) member states and the four EFTA countries. It represents a wide range of topography, demography, and architectural designs across the countries. It would eliminate the need to collect and process a new data set in order to develop and validate the methods. The data set comprises digital elevation models (DEM), local climate zones (LCZ), land use (LU), and nighttime lights (VIIRS) in combination with multi-spectral Sentinel-2 imagery (SEN2) and data from the Open Street Map initiative (OSM). This multi-data source combination has not been explored before in the domain of population estimation. We expect that it will be a valuable addition to the research community for developing sophisticated approaches in the field of population estimation.

About the Methodology:

The preprocessing of all the data used to produce the input data for each city is shown step-by-step in Figure 1. All of the input data has been cropped using our own algorithm's established city borders.

Figure 1: Step-by-step preprocessing of all the input data sources to prepare the corresponding input data for each city.

The input data that was processed in the first step was used to construct the patches in the following step. The odd-numbered class samples from our data set are shown in Figure 2 along with the corresponding patch-set, population class, and population count. The lower classes correspond to areas that are lightly populated. Lower class patches are largely composed of bare ground, water, and green fields. Patches feature sparse low-rise to dense high-rise built-up regions as the class number increases. In other words, lower to higher class patches correspond to rural and urban areas, respectively. 

Figure 2: Sample patches from the odd numbered classes of our data set. Lower classes depicts sparsely populated regions while higher classes depicts densely populated regions.

To demonstrate the potential of our data set, we trained the Random Forest model on our test data set using the extracted features from the input data to estimate the population. The preliminary findings suggest that the So2Sat POP data set presents a feasible opportunity for the development of potent machine learning techniques. 

Please sign in or register for FREE

If you are a registered user on Research Data Community, please sign in

Related Collections

With collections, you can get published faster and increase your visibility.

Genomics data for plant ecology, conservation and agriculture

This Collection presents a series of articles describing genomics, transcriptomics, metagenomics, or datasets related to species or plants of ecological or agricultural interest.

Publishing Model: Open Access

Deadline: Jan 20, 2024

Ecological data for tracking biological diversity and environmental change

This collection presents data contributions addressing topics in biodiversity and ecology.

Publishing Model: Open Access

Deadline: Jan 31, 2024