Automatized spatio-temporal detection of drought impacts from newspaper articles using natural language processing and machine learning

被引:14
|
作者
Sodoge, Jan [1 ,2 ]
Kuhlicke, Christian [1 ,2 ]
de Brito, Mariana Madruga [1 ]
机构
[1] UFZ Helmholtz Ctr Environm Res, Dept Urban & Environm Sociol, D-04318 Leipzig, Germany
[2] Univ Potsdam, Inst Environm Sci & Geog, D-14476 Potsdam, Germany
来源
关键词
Germany; Drought; NLP; Text mining; Machine learning; Natural hazards; Socio-economic impacts; Longitudinal study; BIG DATA; TEXT; EVENTS; FOREST;
D O I
10.1016/j.wace.2023.100574
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
Droughts are expected to increase both in terms of frequency and magnitude across Europe. Despite the multitude of adverse effects these disasters impose on social-ecological systems, most impact assessments are constrained to single event and/or single sector analyses. Furthermore, existing longitudinal multi-sectoral datasets are limited in spatiotemporal homogeneity and scope, resulting in fragmented datasets. To address this gap, we propose a novel method for the automatized detection of drought impacts based on newspaper articles. We employ natural language processing (NLP) and machine learning to identify different socio-economic impacts (e.g. agriculture, forestry, livestock, fires) and their geographic and temporal scope from 40,000 newspaper articles reporting about droughts in Germany between 2000 and 2021. Our method is able to track impacts over long time periods, allowing us to assess how drought impacts evolve. Accuracy levels of 92-96% per impact class were obtained for the automatic classification of the impacts when evaluated on a human-annotated dataset. Furthermore, our resulting impact dataset can replicate both temporal and spatial trends when validated against independent impact and hazard data. Overall, the proposed approach advances current research as it (1) requires a significantly lower workload than conventional impact assessment methods, (2) allows addressing large text datasets, (3) reduces subjectivity and human bias, (4) is generalizable to other hazard types as well as text corpora, and (5) achieves sufficient levels of accuracy. The findings highlight the applicability of NLP and machine learning to create comprehensive longitudinal impact datasets.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Automatized Drought Impact Detection Using Natural Language Processing
    Sodoge, Jan
    de Brito, Mariana Madruga
    Kuhlicke, Christian
    WASSERWIRTSCHAFT, 2022, 112 : 30 - 31
  • [2] Recategorizing Interdisciplinary Articles Using Natural Language Processing and Machine/Deep Learning
    Tanaka, Kazuya
    Arakawa, Riku
    Kameoka, Yasuaki
    Sakai, Ichiro
    2018 PORTLAND INTERNATIONAL CONFERENCE ON MANAGEMENT OF ENGINEERING AND TECHNOLOGY (PICMET '18): MANAGING TECHNOLOGICAL ENTREPRENEURSHIP: THE ENGINE FOR ECONOMIC GROWTH, 2018,
  • [3] Building natural language responses from natural language questions in the spatio-temporal context
    Landoulsi G.
    Mahmoudi K.
    Faïz S.
    International Journal of Intelligent Information and Database Systems, 2021, 14 (01) : 1 - 25
  • [4] Predicting High Impact Ophthalmology Articles Using Machine Learning and Natural Language Processing
    Karandikar, Yash
    Wang, Sophia Y.
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2021, 62 (08)
  • [5] Spatio-temporal tracking from natural language statements using outer probability theory
    Bishop, Adrian N.
    Houssineau, Jeremie
    Angley, Daniel
    Ristic, Branko
    INFORMATION SCIENCES, 2018, 463 : 56 - 74
  • [6] Stock Prices Prediction using the Title of Newspaper Articles with Korean Natural Language Processing
    Yun, Hyungbin
    Sim, Ghudae
    Seok, Junhee
    2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019), 2019, : 19 - 21
  • [7] Network Intrusion Detection using Natural Language Processing and Ensemble Machine Learning
    Das, Saikat
    Ashrafuzzamant, Mohammad
    Sheldon, Frederick T.
    Shiva, Sajjan
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 829 - 835
  • [8] Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms
    Prachi, Noshin Nirvana
    Habibullah, Md.
    Rafi, Md. Emanul Haque
    Alam, Evan
    Khan, Riasat
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (06) : 652 - 661
  • [9] Urban Event Detection from Spatio-temporal IoT Sensor Data Using Graph-Based Machine Learning
    Park, Dae-Young
    Ko, In-Young
    2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 234 - 241
  • [10] Spatio-temporal variation of meteorological, hydrological and agricultural drought vulnerability: Insights from statistical, machine learning and wavelet analysis
    Saha, Asish
    Pal, Subodh Chandra
    GROUNDWATER FOR SUSTAINABLE DEVELOPMENT, 2024, 27