Mapping the plague through natural language processing

被引:3
|
作者
Krauer, Fabienne [1 ]
Schmid, Boris V. [1 ]
机构
[1] Univ Oslo, Ctr Ecol & Evolutionary Synth, Dept Biosci, N-0316 Oslo, Norway
关键词
Plague; Infectious diseases; Historical epidemiology; Outbreaks; Natural language processing; Machine learning;
D O I
10.1016/j.epidem.2022.100656
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Pandemic diseases such as plague have produced a vast amount of literature providing information about the spatiotemporal extent, transmission, or countermeasures. However, the manual extraction of such information from running text is a tedious process, and much of this information remains locked into a narrative format. Natural Language processing (NLP) is a promising tool for the automated extraction of epidemiological data, and can facilitate the establishment of datasets. In this paper, we explore the utility of NLP to assist in the creation of a plague outbreak dataset. We produced a gold standard list of toponyms by manual annotation of a German plague treatise published by Sticker in 1908. We investigated the performance of five pre-trained NLP libraries (Google, Stanford CoreNLP, spaCy, germaNER and Geoparser) for the automated extraction of location data compared to the gold standard. Of all tested algorithms, spaCy performed best (sensitivity 0.92, F1 score 0.83), followed closely by Stanford CoreNLP (sensitivity 0.81, F1 score 0.87). Google NLP had a slightly lower per-formance (F1 score 0.72, sensitivity 0.78). Geoparser and germaNER had a poor sensitivity (0.41 and 0.61). We then evaluated how well automated geocoding services such as Google geocoding, Geonames and Geoparser located these outbreaks correctly. All geocoding services performed poorly - particularly for historical regions - and returned the correct GIS information only in 60.4%, 52.7% and 33.8% of all cases. Finally, we compared our newly digitized plague dataset to a re-digitized version of the plague treatise by Biraben and provide an update of the spatio-temporal extent of the second pandemic plague outbreaks. We conclude that NLP tools have their limitations, but they are potentially useful to accelerate the collection of data and the generation of a global plague outbreak database.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Towards Personalized Educational Materials: Mapping Student Knowledge Through Natural Language Processing
    Domenichini, Diana
    Giordano, Vito
    Fantoni, Gualtiero
    Chiarello, Filippo
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II, 2025, 2134 : 64 - 79
  • [2] An introduction to natural language processing through prolog
    Covington, MA
    COMPUTATIONAL LINGUISTICS, 1999, 25 (02) : 304 - 304
  • [3] Processing natural language without natural language processing
    Brill, E
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 360 - 369
  • [4] Agile Development Methodologies and Natural Language Processing: A Mapping Review
    Quintana, Manuel A.
    Palacio, Ramon R.
    Borrego Soto, Gilberto
    Gonzalez-Lopez, Samuel
    COMPUTERS, 2022, 11 (12)
  • [5] Natural Language Processing for Requirements Engineering: A Systematic Mapping Study
    Zhao, Liping
    Alhoshan, Waad
    Ferrari, Alessio
    Letsholo, Keletso J.
    Ajagbe, Muideen A.
    Chioasca, Erol-Valeriu
    Batista-Navarro, Riza T.
    ACM COMPUTING SURVEYS, 2022, 54 (03)
  • [6] Mapping global conversations on twitter about environmental, social, and governance topics through natural language processing
    Kouloukoui, Daniel
    de Marcellis-Warin, Nathalie
    Gomes, Sonia Maria da Silva
    Warin, Thierry
    JOURNAL OF CLEANER PRODUCTION, 2023, 414
  • [7] Enhancing EEG Spellers Through Natural Language Processing
    Yazdani, Milad
    Sardouie, Sepideh Hajipour
    2023 30TH NATIONAL AND 8TH INTERNATIONAL IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING, ICBME, 2023, : 290 - 293
  • [8] Spotting and discovering terms through natural language processing
    Ananiadou, S
    COMPUTATIONAL LINGUISTICS, 2002, 28 (02) : 217 - 220
  • [9] Spotting and Discovering Terms Through Natural Language Processing
    Nina Wacholder
    Information Retrieval, 2003, 6 (2): : 277 - 281
  • [10] Mapping the Natural Language Processing Domain: Experiments using the ACL Anthology
    Omodei, Elisa
    Cointet, Jean-Philippe
    Poibeau, Thierry
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2972 - 2978