Mapping the plague through natural language processing

被引:3
|
作者
Krauer, Fabienne [1 ]
Schmid, Boris V. [1 ]
机构
[1] Univ Oslo, Ctr Ecol & Evolutionary Synth, Dept Biosci, N-0316 Oslo, Norway
关键词
Plague; Infectious diseases; Historical epidemiology; Outbreaks; Natural language processing; Machine learning;
D O I
10.1016/j.epidem.2022.100656
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Pandemic diseases such as plague have produced a vast amount of literature providing information about the spatiotemporal extent, transmission, or countermeasures. However, the manual extraction of such information from running text is a tedious process, and much of this information remains locked into a narrative format. Natural Language processing (NLP) is a promising tool for the automated extraction of epidemiological data, and can facilitate the establishment of datasets. In this paper, we explore the utility of NLP to assist in the creation of a plague outbreak dataset. We produced a gold standard list of toponyms by manual annotation of a German plague treatise published by Sticker in 1908. We investigated the performance of five pre-trained NLP libraries (Google, Stanford CoreNLP, spaCy, germaNER and Geoparser) for the automated extraction of location data compared to the gold standard. Of all tested algorithms, spaCy performed best (sensitivity 0.92, F1 score 0.83), followed closely by Stanford CoreNLP (sensitivity 0.81, F1 score 0.87). Google NLP had a slightly lower per-formance (F1 score 0.72, sensitivity 0.78). Geoparser and germaNER had a poor sensitivity (0.41 and 0.61). We then evaluated how well automated geocoding services such as Google geocoding, Geonames and Geoparser located these outbreaks correctly. All geocoding services performed poorly - particularly for historical regions - and returned the correct GIS information only in 60.4%, 52.7% and 33.8% of all cases. Finally, we compared our newly digitized plague dataset to a re-digitized version of the plague treatise by Biraben and provide an update of the spatio-temporal extent of the second pandemic plague outbreaks. We conclude that NLP tools have their limitations, but they are potentially useful to accelerate the collection of data and the generation of a global plague outbreak database.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Exploring the Automatisation of Animal Health Surveillance Through Natural Language Processing
    Arguello-Casteleiro, Mercedes
    Jones, Philip H.
    Robertson, Sara
    Irvine, Richard M.
    Twomey, Fin
    Nenadic, Goran
    ARTIFICIAL INTELLIGENCE XXXVI, 2019, 11927 : 213 - 226
  • [42] Global Research on Natural Disasters and Human Health: a Mapping Study Using Natural Language Processing Techniques
    Ye, Xin
    Lin, Hugo
    CURRENT ENVIRONMENTAL HEALTH REPORTS, 2024, 11 (01) : 61 - 70
  • [43] Global Research on Natural Disasters and Human Health: a Mapping Study Using Natural Language Processing Techniques
    Xin Ye
    Hugo Lin
    Current Environmental Health Reports, 2024, 11 : 61 - 70
  • [44] Incident Management Optimization through the Reuse of Experiences and Natural Language Processing
    Vieira Bezerra, Glauber de Tarso
    Monteiro Pinheiro, Vladia Celia
    Albuquerque, Adriano Bessa
    2014 9TH INTERNATIONAL CONFERENCE ON THE QUALITY OF INFORMATION AND COMMUNICATIONS TECHNOLOGY (QUATIC), 2014, : 247 - 254
  • [45] Incident Management Optimization through the Reuse of Experiences and Natural Language Processing
    Bezerra, Glauber
    Pinheiro, Vladia
    Bessa, Adriano
    2014 9TH INTERNATIONAL CONFERENCE ON THE QUALITY OF INFORMATION AND COMMUNICATIONS TECHNOLOGY (QUATIC), 2014, : 58 - 65
  • [46] Enhancing content validity in personality assessment through natural language processing
    Marra, Tales
    Kubiak, Emeric
    Baron, Simon
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2024, 59 : 533 - 533
  • [47] Coding in the Liberal Arts through Natural Language Processing and Machine Learning
    Wolz, Ursula
    Wilson, Jennifer
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13506 - 13507
  • [48] NEW TRENDS IN NATURAL-LANGUAGE PROCESSING - STATISTICAL NATURAL-LANGUAGE PROCESSING
    MARCUS, M
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (22) : 10052 - 10059
  • [49] Introduction to Chinese Natural Language Processing (Review of Introduction to Chinese Natural Language Processing)
    Jiang Song
    JOURNAL OF TECHNOLOGY AND CHINESE LANGUAGE TEACHING, 2010, 1 (01): : 94 - 98
  • [50] NLP (Natural Language Processing) for NLP (Natural Language Programming)
    Mihalcea, R
    Liu, H
    Lieberman, H
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2006, 3878 : 319 - 330