In late-2015 I began my PhD project investigating how rapid-onset disasters might impact ongoing armed conflicts as part of a larger project on the security implications of climate change. Five years and a PhD thesis later, the GDIS dataset on geocoded disaster locations is finally available.
The idea of the dataset came out early in the PhD project. Knowing that the International Disaster Database (EM-DAT) provided a list of all recorded disasters of a certain magnitude across the world, I planned a quantitative assessment of how disasters might influence various conflict dimensions and actors in ongoing armed conflicts. However, I quickly realized that even though in many countries these two phenomena are co-occurring both in time and space, countries are large and conflict actors often operate in delimited areas within a country. I therefore concluded that in order to proceed, I needed to know more about where the disasters occurred within each country.
It became clear that a dataset of subnational disaster locations would be of interest also beyond our immediate research project, and together with my supervisor and project leader, we decided to geocode natural hazard-related disaster events listed in EM-DAT that had occurred after 1960. For the vast majority of the disasters, one or several locations were mentioned in a text column in EM-DAT. To identify and assign geographic information to these places, we relied on data provided by the Global Administrative Boundaries Database (GADM), which provides maps and spatial data for all countries and their subdivisions.
The matching of locations was first done with automated scripts in R, but being based on text – the names of the places that had been affected by the disaster – substantial manual coding was also necessary. In addition to the many instances where spelling (and even language) was different across data sources, some locations would be very specific (like a city neighborhood or village), while others were more diffuse (for example a mountain range or a cultural or ethnic area). With eminent research assistance, we manually went though all observations that did not automatically match in order to establish whether we could credibly place it within an administrative boundary and which boundary that should be. With a candidate list of 11 000 disasters and 47 000 locations this was a time-consuming task, and in the end we identified 39 953 locations for 9 924 disasters.
The geographic information on disaster locations provided by GDIS enables connecting the disasters to virtually any other geographic data source. We hope that our data descriptor in Scientific Data and the dissemination of the dataset through NASA’s Socioeconomic Data and Applications Center (SEDAC) will reach beyond our own immediate research communities, and that the data will be widely used and push new frontiers of research.
Our paper in Scientific Data is available here.
Guha-Sapir, D., Below, R. & Hoyois, P. EM-DAT: International disaster database. Centre for Research on the Epidemiology of Disasters (CRED) (2014).
GADM. Database of Global Administrative Areas https://gadm.org/data.html (2018).
Rosvold, E. L. & Buhaug, H. Geocoded disaster (GDIS) dataset, 1960-2018. Socioeconomic Data and Applications Center (SEDAC) https://doi.org/10.7927/zz3b-8y61 (2021).