Automatized spatio-temporal detection of drought impacts from newspaper articles using natural language processing and machine learning

被引:14
|
作者
Sodoge, Jan [1 ,2 ]
Kuhlicke, Christian [1 ,2 ]
de Brito, Mariana Madruga [1 ]
机构
[1] UFZ Helmholtz Ctr Environm Res, Dept Urban & Environm Sociol, D-04318 Leipzig, Germany
[2] Univ Potsdam, Inst Environm Sci & Geog, D-14476 Potsdam, Germany
来源
关键词
Germany; Drought; NLP; Text mining; Machine learning; Natural hazards; Socio-economic impacts; Longitudinal study; BIG DATA; TEXT; EVENTS; FOREST;
D O I
10.1016/j.wace.2023.100574
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
Droughts are expected to increase both in terms of frequency and magnitude across Europe. Despite the multitude of adverse effects these disasters impose on social-ecological systems, most impact assessments are constrained to single event and/or single sector analyses. Furthermore, existing longitudinal multi-sectoral datasets are limited in spatiotemporal homogeneity and scope, resulting in fragmented datasets. To address this gap, we propose a novel method for the automatized detection of drought impacts based on newspaper articles. We employ natural language processing (NLP) and machine learning to identify different socio-economic impacts (e.g. agriculture, forestry, livestock, fires) and their geographic and temporal scope from 40,000 newspaper articles reporting about droughts in Germany between 2000 and 2021. Our method is able to track impacts over long time periods, allowing us to assess how drought impacts evolve. Accuracy levels of 92-96% per impact class were obtained for the automatic classification of the impacts when evaluated on a human-annotated dataset. Furthermore, our resulting impact dataset can replicate both temporal and spatial trends when validated against independent impact and hazard data. Overall, the proposed approach advances current research as it (1) requires a significantly lower workload than conventional impact assessment methods, (2) allows addressing large text datasets, (3) reduces subjectivity and human bias, (4) is generalizable to other hazard types as well as text corpora, and (5) achieves sufficient levels of accuracy. The findings highlight the applicability of NLP and machine learning to create comprehensive longitudinal impact datasets.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Vision-based traffic accident detection using sparse spatio-temporal features and weighted extreme learning machine
    Yu, Yuanlong
    Xu, Miaoxing
    Gu, Jason
    IET INTELLIGENT TRANSPORT SYSTEMS, 2019, 13 (09) : 1417 - 1428
  • [32] Machine Learning Based Estimation of Ozone Using Spatio-Temporal Data from Air Quality Monitoring Stations
    Chiwewe, Tapiwa M.
    Ditsela, Jeofrey
    2016 IEEE 14TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2016, : 58 - 63
  • [33] Extracting Spatio-Temporal Trends in Medical Research Prioritization Through Natural Language Processing of Case Report Abstracts
    Yao, Lean Franzl Lim
    Liew, Kongmeng
    Wakamiya, Shoko
    Aramaki, Eiji
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 634 - 638
  • [34] Detecting Phishing Attacks Using Natural Language Processing and Machine Learning
    Peng, Tianrui
    Harris, Ian G.
    Sawa, Yuki
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 300 - 301
  • [35] Analysis of Breakdown Reports Using Natural Language Processing and Machine Learning
    Ahmed, Mobyen Uddin
    Bengtsson, Marcus
    Salonen, Antti
    Funk, Peter
    INTERNATIONAL CONGRESS AND WORKSHOP ON INDUSTRIAL AI 2021, 2022, : 40 - 52
  • [36] CATEGORIZING TELEMEDICINE VISITS USING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Sudaria, T.
    Overcash, J.
    Nguyen, N.
    Oguntuga, A.
    VALUE IN HEALTH, 2022, 25 (07) : S597 - S597
  • [37] Detecting Phishing Attacks Using Natural Language Processing And Machine Learning
    Banu, Reshma
    Anand, M.
    Kamath, Akshatha C.
    Ashika, S.
    Ujwala, H. S.
    Harshitha, S. N.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1210 - 1214
  • [38] Subjective Answers Evaluation Using Machine Learning and Natural Language Processing
    Bashir, Muhammad Farrukh
    Arshad, Hamza
    Javed, Abdul Rehman
    Kryvinska, Natalia
    Band, Shahab S.
    IEEE ACCESS, 2021, 9 : 158972 - 158983
  • [39] Leveraging Natural Language Processing and Machine Learning for Efficient Fake News Detection
    Kumar, Naresh
    Malhotra, Meetu
    Aggarwal, Bharti
    Rai, Dinesh
    Aggarwal, Gaurav
    Proceedings - International Conference on Technological Advancements in Computational Sciences, ICTACS 2023, 2023, : 535 - 541
  • [40] A Comparison of Natural Language Processing and Machine Learning Methods for Phishing Email Detection
    Bountakas, Panagiotis
    Koutroumpouchos, Konstantinos
    Xenakis, Christos
    ARES 2021: 16TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, 2021,