Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre

被引:5
|
作者
Hunter, Benjamin [1 ,2 ]
Reis, Sara [1 ]
Campbell, Des [1 ]
Matharu, Sheila [1 ]
Ratnakumar, Prashanthi [3 ]
Mercuri, Luca [4 ]
Hindocha, Sumeet [1 ,2 ]
Kalsi, Hardeep [1 ,2 ]
Mayer, Erik [2 ,4 ]
Glampson, Ben [4 ]
Robinson, Emily J. [5 ]
Al-Lazikani, Bisan [6 ]
Scerri, Lisa [1 ]
Bloch, Susannah [3 ]
Lee, Richard [1 ,7 ,8 ]
机构
[1] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Lung Unit, London, England
[2] Imperial Coll London, Dept Surg & Canc, London, England
[3] Imperial Coll Healthcare Trust, Resp Med, London, England
[4] Imperial Coll Healthcare Natl Hlth Serv NHS Trust, Imperial Clin Analyt Res & Evaluat, London, England
[5] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Royal Marsden Clin Trials Unit, London, England
[6] Inst Canc Res, Computat Biol & Chromogenet, London, England
[7] Imperial Coll London, Natl Heart & Lung Inst, London, England
[8] Inst Canc Res, Early Diag & Detect Genet & Epidemiol, London, England
关键词
lung nodule; informatics; structured query language (SQL); natural language processing (NLP); machine learning; PULMONARY; IDENTIFICATION; RADIOLOGY;
D O I
10.3389/fmed.2021.748168
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] NATURAL LANGUAGE QUERY PROCESSING IN A TEMPORAL DATABASE.
    De, Suranjan
    Pan, Shuh-Shen
    Whinston, Andrew B.
    Data and Knowledge Engineering, 1985, 1 (01): : 3 - 15
  • [22] Natural Language Processing to Identify Pulmonary Nodules and Extract Nodule Characteristics From Radiology Reports
    Zheng, Chengyi
    Huang, Brian Z.
    Agazaryan, Andranik A.
    Creekmur, Beth
    Osuj, Thearis A.
    Gould, Michael K.
    CHEST, 2021, 160 (05) : 1902 - 1914
  • [23] Novel and Efficient Clustering Algorithm Using Structured Query Language
    Suresh, L.
    Simha, Jay B.
    ICCN: 2008 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING, 2008, : 625 - +
  • [24] Novel and Efficient Clustering Algorithm Using Structured Query Language
    Suresh, L.
    Simha, Jay. B.
    Rajappa, V.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (11): : 406 - 410
  • [25] DEVELOPMENT AND VALIDATION OF A NATURAL LANGUAGE PROCESSING SYSTEM TO IDENTIFY INFORMATION ON KEY TRADEOFFS IN PROSTATE CANCER TREATMENT
    Daskivich, Timothy
    Luu, Michael
    Gale, Rebecca
    Khodyakov, Dmitry
    Freedland, Stephen
    Spiegel, Brennan
    JOURNAL OF UROLOGY, 2024, 211 (05): : E392 - E393
  • [26] Prospect of large language models and natural language processing for lung cancer diagnosis: A systematic review
    Garg, Arushi
    Gupta, Smridhi
    Vats, Soumya
    Handa, Palak
    Goel, Nidhi
    EXPERT SYSTEMS, 2024, 41 (11)
  • [27] VALIDATION OF A NATURAL LANGUAGE PROCESSING ALGORITHM TO IDENTIFY COLONIC ADENOMAS ACROSS A HEALTH SYSTEM
    Morgan, David G.
    Chorneyko, Kathy
    Swain, Deepak
    Bowes, Barbara
    Lee, Vicki
    Tinmouth, Jill
    GASTROENTEROLOGY, 2019, 156 (06) : S56 - S56
  • [28] Using Natural Language Processing to Identify Effective Influencers
    Fang, Xing
    Wang, Tianfu
    INTERNATIONAL JOURNAL OF MARKET RESEARCH, 2022, 64 (05) : 611 - 629
  • [29] Using natural language processing to identify depression in diabetes
    Fischer, Lucy R.
    Rush, William A.
    Kluznik, John C.
    O'Connor, Patrick J.
    Hanson, Ann M.
    Ekstrom, Heidi L.
    DIABETES, 2008, 57 : A349 - A349
  • [30] Development of a Deep Learning Natural Language Processing Model for Classification of Lung Cancer Radiology Reports
    Mithun, S.
    Jha, A. K.
    Sherkhane, U. B.
    Jaiswar, V.
    Nautiyal, A.
    Purandare, N. C.
    Rangarajan, V.
    Dekker, A.
    Wee, L.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2021, 48 (SUPPL 1) : S330 - S330