Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre

被引：5

作者：

Hunter, Benjamin ^{[1
,2
]}

Reis, Sara ^{[1
]}

Campbell, Des ^{[1
]}

Matharu, Sheila ^{[1
]}

Ratnakumar, Prashanthi ^{[3
]}

Mercuri, Luca ^{[4
]}

Hindocha, Sumeet ^{[1
,2
]}

Kalsi, Hardeep ^{[1
,2
]}

Mayer, Erik ^{[2
,4
]}

Glampson, Ben ^{[4
]}

Robinson, Emily J. ^{[5
]}

Al-Lazikani, Bisan ^{[6
]}

Scerri, Lisa ^{[1
]}

Bloch, Susannah ^{[3
]}

Lee, Richard ^{[1
,7
,8
]}

机构：

[1] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Lung Unit, London, England

[2] Imperial Coll London, Dept Surg & Canc, London, England

[3] Imperial Coll Healthcare Trust, Resp Med, London, England

[4] Imperial Coll Healthcare Natl Hlth Serv NHS Trust, Imperial Clin Analyt Res & Evaluat, London, England

[5] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Royal Marsden Clin Trials Unit, London, England

[6] Inst Canc Res, Computat Biol & Chromogenet, London, England

[7] Imperial Coll London, Natl Heart & Lung Inst, London, England

[8] Inst Canc Res, Early Diag & Detect Genet & Epidemiol, London, England

来源：

FRONTIERS IN MEDICINE | 2021年 / 8卷

关键词：

lung nodule; informatics; structured query language (SQL); natural language processing (NLP); machine learning; PULMONARY; IDENTIFICATION; RADIOLOGY;

D O I：

10.3389/fmed.2021.748168

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.

引用

页数：10

共 50 条

[1] Lung nodules in a cancer centre identified with a structured-query-language computer search algorithm
Hunter, B.
Reis, S.
Hindocha, S.
Lee, R.
LUNG CANCER, 2020, 139 : S7 - S7
[2] A Naive approach: Translation of Natural Language to Structured Query Language
Rautaray, Jyotirmayee
Mishra, Pranati
EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
[3] A natural language query interface to structured information
Tablan, Valentin
Damljanovic, Danica
Bontcheva, Kalina
SEMANTIC WEB: RESEARCH AND APPLICATIONS, PROCEEDINGS, 2008, 5021 : 361 - 375
[4] Indonesian Text Translator into Database Structured Query Language with Multi Parameters using Natural Language Processing
Hermawan, G.
Faturohman, I
Isharmawan, N.
2ND INTERNATIONAL CONFERENCE ON INFORMATICS, ENGINEERING, SCIENCE, AND TECHNOLOGY (INCITEST 2019), 2019, 662
[5] Derivation of a natural language processing algorithm to identify febrile infants
Yaeger, Jeffrey P.
Lu, Jiahao
Jones, Jeremiah
Ertefaie, Ashkan
Fiscella, Kevin
Gildea, Daniel
JOURNAL OF HOSPITAL MEDICINE, 2022, 17 (01) : 11 - 18
[6] VALIDITY OF NATURAL LANGUAGE PROCESSING TO IDENTIFY PATIENTS WITH PROSTATE CANCER
Thomas, Anil
Zheng, Chengyi
Jung, Howard
Chang, Allen
Kim, Brian
Gelfond, Joy
Slezak, Jeff
Porter, Kim
Jacobsen, Steven
Chien, Gary
JOURNAL OF UROLOGY, 2013, 189 (04): : E34 - E34
[7] Query builder:: A natural language interface for structured databases
Little, J
de Ga, M
Özyer, T
Alhajj, R
COMPUTER AND INFORMATION SCIENCES - ISCIS 2004, PROCEEDINGS, 2004, 3280 : 470 - 479
[8] Lemmatization Algorithm Development for Bangla Natural Language Processing
Kowsher, Md
Tahabilder, Anik
Sarker, Md Murad Hossain
Sanjid, Md Zahidul Islam
Prottasha, Nusrat Jahan
2020 JOINT 9TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2020 4TH INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2020,
[9] Development of Natural Language Processing Algorithm for Dental Charting
Zhang, Yifan
Bogard, Brandon
Zhang, Chengdui
2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 403 - 404
[10] Development of Natural Language Processing Algorithm for Dental Charting
Zhang Y.
Bogard B.
Zhang C.
SN Computer Science, 2021, 2 (4)

← 1 2 3 4 5 →