Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre

被引:5
|
作者
Hunter, Benjamin [1 ,2 ]
Reis, Sara [1 ]
Campbell, Des [1 ]
Matharu, Sheila [1 ]
Ratnakumar, Prashanthi [3 ]
Mercuri, Luca [4 ]
Hindocha, Sumeet [1 ,2 ]
Kalsi, Hardeep [1 ,2 ]
Mayer, Erik [2 ,4 ]
Glampson, Ben [4 ]
Robinson, Emily J. [5 ]
Al-Lazikani, Bisan [6 ]
Scerri, Lisa [1 ]
Bloch, Susannah [3 ]
Lee, Richard [1 ,7 ,8 ]
机构
[1] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Lung Unit, London, England
[2] Imperial Coll London, Dept Surg & Canc, London, England
[3] Imperial Coll Healthcare Trust, Resp Med, London, England
[4] Imperial Coll Healthcare Natl Hlth Serv NHS Trust, Imperial Clin Analyt Res & Evaluat, London, England
[5] Royal Marsden Natl Hlth Serv NHS Fdn Trust, Royal Marsden Clin Trials Unit, London, England
[6] Inst Canc Res, Computat Biol & Chromogenet, London, England
[7] Imperial Coll London, Natl Heart & Lung Inst, London, England
[8] Inst Canc Res, Early Diag & Detect Genet & Epidemiol, London, England
关键词
lung nodule; informatics; structured query language (SQL); natural language processing (NLP); machine learning; PULMONARY; IDENTIFICATION; RADIOLOGY;
D O I
10.3389/fmed.2021.748168
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Lung nodules in a cancer centre identified with a structured-query-language computer search algorithm
    Hunter, B.
    Reis, S.
    Hindocha, S.
    Lee, R.
    LUNG CANCER, 2020, 139 : S7 - S7
  • [2] A Naive approach: Translation of Natural Language to Structured Query Language
    Rautaray, Jyotirmayee
    Mishra, Pranati
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
  • [3] A natural language query interface to structured information
    Tablan, Valentin
    Damljanovic, Danica
    Bontcheva, Kalina
    SEMANTIC WEB: RESEARCH AND APPLICATIONS, PROCEEDINGS, 2008, 5021 : 361 - 375
  • [4] Indonesian Text Translator into Database Structured Query Language with Multi Parameters using Natural Language Processing
    Hermawan, G.
    Faturohman, I
    Isharmawan, N.
    2ND INTERNATIONAL CONFERENCE ON INFORMATICS, ENGINEERING, SCIENCE, AND TECHNOLOGY (INCITEST 2019), 2019, 662
  • [5] Derivation of a natural language processing algorithm to identify febrile infants
    Yaeger, Jeffrey P.
    Lu, Jiahao
    Jones, Jeremiah
    Ertefaie, Ashkan
    Fiscella, Kevin
    Gildea, Daniel
    JOURNAL OF HOSPITAL MEDICINE, 2022, 17 (01) : 11 - 18
  • [6] VALIDITY OF NATURAL LANGUAGE PROCESSING TO IDENTIFY PATIENTS WITH PROSTATE CANCER
    Thomas, Anil
    Zheng, Chengyi
    Jung, Howard
    Chang, Allen
    Kim, Brian
    Gelfond, Joy
    Slezak, Jeff
    Porter, Kim
    Jacobsen, Steven
    Chien, Gary
    JOURNAL OF UROLOGY, 2013, 189 (04): : E34 - E34
  • [7] Query builder:: A natural language interface for structured databases
    Little, J
    de Ga, M
    Özyer, T
    Alhajj, R
    COMPUTER AND INFORMATION SCIENCES - ISCIS 2004, PROCEEDINGS, 2004, 3280 : 470 - 479
  • [8] Lemmatization Algorithm Development for Bangla Natural Language Processing
    Kowsher, Md
    Tahabilder, Anik
    Sarker, Md Murad Hossain
    Sanjid, Md Zahidul Islam
    Prottasha, Nusrat Jahan
    2020 JOINT 9TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2020 4TH INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2020,
  • [9] Development of Natural Language Processing Algorithm for Dental Charting
    Zhang, Yifan
    Bogard, Brandon
    Zhang, Chengdui
    2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 403 - 404
  • [10] Development of Natural Language Processing Algorithm for Dental Charting
    Zhang Y.
    Bogard B.
    Zhang C.
    SN Computer Science, 2021, 2 (4)