Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre

被引:5
|
作者
Hunter, Benjamin [1 ,2 ]
Reis, Sara [1 ]
Campbell, Des [1 ]
Matharu, Sheila [1 ]
Ratnakumar, Prashanthi [3 ]
Mercuri, Luca [4 ]
Hindocha, Sumeet [1 ,2 ]
Kalsi, Hardeep [1 ,2 ]
Mayer, Erik [2 ,4 ]
Glampson, Ben [4 ]
Robinson, Emily J. [5 ]
Al-Lazikani, Bisan [6 ]
Scerri, Lisa [1 ]
Bloch, Susannah [3 ]
Lee, Richard [1 ,7 ,8 ]
机构
[1] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Lung Unit, London, England
[2] Imperial Coll London, Dept Surg & Canc, London, England
[3] Imperial Coll Healthcare Trust, Resp Med, London, England
[4] Imperial Coll Healthcare Natl Hlth Serv NHS Trust, Imperial Clin Analyt Res & Evaluat, London, England
[5] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Royal Marsden Clin Trials Unit, London, England
[6] Inst Canc Res, Computat Biol & Chromogenet, London, England
[7] Imperial Coll London, Natl Heart & Lung Inst, London, England
[8] Inst Canc Res, Early Diag & Detect Genet & Epidemiol, London, England
关键词
lung nodule; informatics; structured query language (SQL); natural language processing (NLP); machine learning; PULMONARY; IDENTIFICATION; RADIOLOGY;
D O I
10.3389/fmed.2021.748168
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Natural language processing for populating lung cancer clinical research data
    Liwei Wang
    Lei Luo
    Yanshan Wang
    Jason Wampfler
    Ping Yang
    Hongfang Liu
    BMC Medical Informatics and Decision Making, 19
  • [42] A Transformer Natural Language Processing Algorithm for Cancer Associated Thrombosis Phenotype
    Maghsoudi, Arash
    Zhou, Emily
    Guffey, Danielle
    Ma, Shengling
    Xiao, Xiangjun
    Peng, Bo
    Amos, Christopher I.
    Ouyomi, Abiodun O.
    Razjouyan, Javad
    Li, Ang
    BLOOD, 2023, 142
  • [43] Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes
    Veronica Barcelona
    Danielle Scharp
    Hans Moen
    Anahita Davoudi
    Betina R. Idnay
    Kenrick Cato
    Maxim Topaz
    Maternal and Child Health Journal, 2024, 28 : 578 - 586
  • [44] Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes
    Barcelona, Veronica
    Scharp, Danielle
    Moen, Hans
    Davoudi, Anahita
    Idnay, Betina R.
    Cato, Kenrick
    Topaz, Maxim
    MATERNAL AND CHILD HEALTH JOURNAL, 2023, 28 (3) : 578 - 586
  • [45] Development of a Natural Language Processing System to Identify Clinical Documentation of Electronic Cigarette Use
    Alba, Patrick R.
    Gan, Qiwei
    Hu, Mengke
    Zhu, Shu-Hong
    Sherman, Scott E.
    Duvall, Scott L.
    Conway, Mike
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 659 - 663
  • [46] Development and Validation of a Natural Language Processing Tool to Identify Injuries in Infants Associated With Abuse
    Tiyyagura, Gunjan
    Asnes, Andrea G.
    Leventhal, John M.
    Shapiro, Eugene D.
    Auerbach, Marc
    Teng, Wei
    Powers, Emily
    Thomas, Amy
    Lindberg, Daniel M.
    McClelland, Justin
    Kutryb, Carol
    Polzin, Thomas
    Daughtridge, Karen
    Sevin, Virginia
    Hsiao, Allen L.
    ACADEMIC PEDIATRICS, 2022, 22 (06) : 981 - 988
  • [47] Using natural language processing and machine learning to identify breast cancer local recurrence
    Zeng, Zexian
    Espino, Sasa
    Roy, Ankita
    Li, Xiaoyu
    Khan, Seema A.
    Clare, Susan E.
    Jiang, Xia
    Neapolitan, Richard
    Luo, Yuan
    BMC BIOINFORMATICS, 2018, 19
  • [48] Applying natural language processing to patient messages to identify depression concerns in cancer patients
    van Buchem, Marieke M.
    de Hond, Anne A. H.
    Fanconi, Claudio
    Shah, Vaibhavi
    Schuessler, Max
    Kant, Ilse M. J.
    Steyerberg, Ewout W.
    Hernandez-Boussard, Tina
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (10) : 2255 - 2262
  • [49] Using natural language processing and machine learning to identify breast cancer local recurrence
    Zexian Zeng
    Sasa Espino
    Ankita Roy
    Xiaoyu Li
    Seema A. Khan
    Susan E. Clare
    Xia Jiang
    Richard Neapolitan
    Yuan Luo
    BMC Bioinformatics, 19
  • [50] Natural language processing of head CT reports to identify intracranial mass effect: CTIME algorithm
    Gordon, Alexandra June
    Banerjee, Imon
    Block, Jason
    Winstead-Derlega, Christopher
    Wilson, Jennifer G.
    Mitarai, Tsuyoshi
    Jarrett, Michael
    Sanyal, Josh
    Rubin, Daniel L.
    Wintermark, Max
    Kohn, Michael A.
    AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2022, 51 : 388 - 392