Mapping the plague through natural language processing

被引：3

作者：

Krauer, Fabienne ^{[1
]}

Schmid, Boris V. ^{[1
]}

机构：

[1] Univ Oslo, Ctr Ecol & Evolutionary Synth, Dept Biosci, N-0316 Oslo, Norway

来源：

EPIDEMICS | 2022年 / 41卷

关键词：

Plague; Infectious diseases; Historical epidemiology; Outbreaks; Natural language processing; Machine learning;

D O I：

10.1016/j.epidem.2022.100656

中图分类号：

R51 [传染病];

学科分类号：

100401 ;

摘要：

Pandemic diseases such as plague have produced a vast amount of literature providing information about the spatiotemporal extent, transmission, or countermeasures. However, the manual extraction of such information from running text is a tedious process, and much of this information remains locked into a narrative format. Natural Language processing (NLP) is a promising tool for the automated extraction of epidemiological data, and can facilitate the establishment of datasets. In this paper, we explore the utility of NLP to assist in the creation of a plague outbreak dataset. We produced a gold standard list of toponyms by manual annotation of a German plague treatise published by Sticker in 1908. We investigated the performance of five pre-trained NLP libraries (Google, Stanford CoreNLP, spaCy, germaNER and Geoparser) for the automated extraction of location data compared to the gold standard. Of all tested algorithms, spaCy performed best (sensitivity 0.92, F1 score 0.83), followed closely by Stanford CoreNLP (sensitivity 0.81, F1 score 0.87). Google NLP had a slightly lower per-formance (F1 score 0.72, sensitivity 0.78). Geoparser and germaNER had a poor sensitivity (0.41 and 0.61). We then evaluated how well automated geocoding services such as Google geocoding, Geonames and Geoparser located these outbreaks correctly. All geocoding services performed poorly - particularly for historical regions - and returned the correct GIS information only in 60.4%, 52.7% and 33.8% of all cases. Finally, we compared our newly digitized plague dataset to a re-digitized version of the plague treatise by Biraben and provide an update of the spatio-temporal extent of the second pandemic plague outbreaks. We conclude that NLP tools have their limitations, but they are potentially useful to accelerate the collection of data and the generation of a global plague outbreak database.

引用

页数：8

共 50 条

[31] An intelligent directory-assistance system using natural language processing and mapping
Kawabe, H
Fukumura, Y
Mutoh, N
Karasawa, H
Iwase, S
NINTH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1997, : 486 - 487
[32] Putting Natural in Natural Language Processing
Chrupala, Grzegorz
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7820 - 7827
[33] Mapping biomedical terminologies using natural language processing tools and UMLS: Mapping the Orphanet thesaurus to the MeSH
Merabti, T.
Joubert, M.
Lecroq, T.
Rath, A.
Darmoni, S. J.
IRBM, 2010, 31 (04) : 221 - 225
[34] Pharmacovigilance through the development of text mining and natural language processing techniques
Segura-Bedmar, Isabel
Martinez, Paloma
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 288 - 291
[35] Identifying Silver Linings During the Pandemic Through Natural Language Processing
Lossio-Ventura, Juan Antonio
Lee, Angela Yuson
Hancock, Jeffrey T.
Linos, Natalia
Linos, Eleni
FRONTIERS IN PSYCHOLOGY, 2021, 12
[36] Considerations for advancing nephrology research and practice through natural language processing
Parr, Sharidan K.
Gobbel, Glenn T.
KIDNEY INTERNATIONAL, 2020, 97 (02) : 263 - 265
[37] TRANSPORTABLE NATURAL-LANGUAGE PROCESSING THROUGH SIMPLICITY - THE PRE SYSTEM
EPSTEIN, SS
ACM TRANSACTIONS ON OFFICE INFORMATION SYSTEMS, 1985, 3 (02): : 107 - 120
[38] Semantic Analysis in the Automation of ER Modelling through Natural Language Processing
Omar, N.
Hanna, P.
Mc Kevitt, P.
2006 INTERNATIONAL CONFERENCE ON COMPUTING & INFORMATICS (ICOCI 2006), 2006, : 441 - +
[39] Detecting hate crimes through machine learning and natural language processing
Salazar, Ana Ortiz
POLICE PRACTICE AND RESEARCH, 2024,
[40] Detection of Social Engineering Attacks Through Natural Language Processing of Conversations
Sawa, Yuki
Bhakta, Ram
Harris, Ian G.
Hadnagy, Christopher
2016 IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2016, : 261 - 264

← 1 2 3 4 5 →