About the Article:
The recently published article, "So2Sat POP - A Curated Benchmark Data Set for Population Estimation from Space on a Continental Scale," provides a comprehensive data set for population estimation in 98 European cities. The cities cover 28 European Union (EU) member states and the four EFTA countries. It represents a wide range of topography, demography, and architectural designs across the countries. It would eliminate the need to collect and process a new data set in order to develop and validate the methods. The data set comprises digital elevation models (DEM), local climate zones (LCZ), land use (LU), and nighttime lights (VIIRS) in combination with multi-spectral Sentinel-2 imagery (SEN2) and data from the Open Street Map initiative (OSM). This multi-data source combination has not been explored before in the domain of population estimation. We expect that it will be a valuable addition to the research community for developing sophisticated approaches in the field of population estimation.
About the Methodology:
The preprocessing of all the data used to produce the input data for each city is shown step-by-step in Figure 1. All of the input data has been cropped using our own algorithm's established city borders.
The input data that was processed in the first step was used to construct the patches in the following step. The odd-numbered class samples from our data set are shown in Figure 2 along with the corresponding patch-set, population class, and population count. The lower classes correspond to areas that are lightly populated. Lower class patches are largely composed of bare ground, water, and green fields. Patches feature sparse low-rise to dense high-rise built-up regions as the class number increases. In other words, lower to higher class patches correspond to rural and urban areas, respectively.
To demonstrate the potential of our data set, we trained the Random Forest model on our test data set using the extracted features from the input data to estimate the population. The preliminary findings suggest that the So2Sat POP data set presents a feasible opportunity for the development of potent machine learning techniques.