Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre

被引:5
|
作者
Hunter, Benjamin [1 ,2 ]
Reis, Sara [1 ]
Campbell, Des [1 ]
Matharu, Sheila [1 ]
Ratnakumar, Prashanthi [3 ]
Mercuri, Luca [4 ]
Hindocha, Sumeet [1 ,2 ]
Kalsi, Hardeep [1 ,2 ]
Mayer, Erik [2 ,4 ]
Glampson, Ben [4 ]
Robinson, Emily J. [5 ]
Al-Lazikani, Bisan [6 ]
Scerri, Lisa [1 ]
Bloch, Susannah [3 ]
Lee, Richard [1 ,7 ,8 ]
机构
[1] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Lung Unit, London, England
[2] Imperial Coll London, Dept Surg & Canc, London, England
[3] Imperial Coll Healthcare Trust, Resp Med, London, England
[4] Imperial Coll Healthcare Natl Hlth Serv NHS Trust, Imperial Clin Analyt Res & Evaluat, London, England
[5] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Royal Marsden Clin Trials Unit, London, England
[6] Inst Canc Res, Computat Biol & Chromogenet, London, England
[7] Imperial Coll London, Natl Heart & Lung Inst, London, England
[8] Inst Canc Res, Early Diag & Detect Genet & Epidemiol, London, England
关键词
lung nodule; informatics; structured query language (SQL); natural language processing (NLP); machine learning; PULMONARY; IDENTIFICATION; RADIOLOGY;
D O I
10.3389/fmed.2021.748168
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Calibrating Structured Output Predictors for Natural Language Processing
    Jagannatha, Abhyuday
    Yu, Hong
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2078 - 2092
  • [32] A review of natural language processing in contact centre automation
    Shah, Shariq
    Ghomeshi, Hossein
    Vakaj, Edlira
    Cooper, Emmett
    Fouad, Shereen
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 823 - 846
  • [33] A review of natural language processing in contact centre automation
    Shariq Shah
    Hossein Ghomeshi
    Edlira Vakaj
    Emmett Cooper
    Shereen Fouad
    Pattern Analysis and Applications, 2023, 26 (3) : 823 - 846
  • [34] Intelligent SPARQL Query Generation for Natural Language Processing Systems
    Chen, Yi-Hui
    Lu, Eric Jui-Lin
    Ou, Ting-An
    IEEE ACCESS, 2021, 9 : 158638 - 158650
  • [35] Flight Schedule Query System based on Natural Language Processing
    Mohamad-Hamza, MAB
    Ahmad, AM
    2002 STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT, PROCEEDINGS: GLOBALIZING RESEARCH AND DEVELOPMENT IN ELECTRICAL AND ELECTRONICS ENGINEERING, 2002, : 80 - 82
  • [36] A Survey of Natural Language Processing Implementation for Data Query Systems
    Wong, Albert
    Joiner, Dakota
    Chiu, Chunyin
    Elsayed, Mohamed
    Pereira, Keegan
    Khmelevsky, Youry
    Mahony, Joe
    IEEE INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN SYSTEMS SCIENCE AND ENGINEERING (IEEE RASSE 2021), 2021,
  • [37] Automated Knowledge Provider System with Natural Language Query Processing
    Mukherjee, Prasenjit
    Chakraborty, Baisakhi
    IETE TECHNICAL REVIEW, 2016, 33 (05) : 525 - 538
  • [38] An Approach for Generating SQL Query Using Natural Language Processing
    More, Priyanka
    Kudale, Bharti
    Deshmukh, Pranali
    Biswas, Indira N.
    More, Neha J.
    Gomes, Francisco S.
    INTELLIGENT COMMUNICATION TECHNOLOGIES AND VIRTUAL MOBILE NETWORKS, ICICV 2019, 2020, 33 : 226 - 230
  • [39] Development of language resources for natural language processing in deep level
    Zhang, Yujie
    Kuroda, Kow
    Izumi, Emi
    Nozawa, Hajime
    Journal of the National Institute of Information and Communications Technology, 2007, 54 (03): : 53 - 61
  • [40] Natural language processing for populating lung cancer clinical research data
    Wang, Liwei
    Luo, Lei
    Wang, Yanshan
    Wampfler, Jason
    Yang, Ping
    Liu, Hongfang
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (01)