Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports

被引:0
|
作者
Munzone, Elisabetta [1 ]
Marra, Antonio [2 ]
Comotto, Federico [3 ]
Guercio, Lorenzo [3 ]
Sangalli, Claudia Anna [4 ]
Lo Cascio, Martina [5 ]
Pagan, Eleonora [6 ]
Sangalli, Davide [5 ]
Bigoni, Ilaria [3 ]
Porta, Francesca Maria [7 ]
D'Ercole, Marianna [7 ]
Ritorti, Fabiana [3 ]
Bagnardi, Vincenzo [6 ]
Fusco, Nicola [7 ,8 ]
Curigliano, Giuseppe [2 ,8 ]
机构
[1] IRCCS, European Inst Oncol, Div Med Senol, Milan, Italy
[2] IRCCS, European Inst Oncol, Div Early Drug Dev Innovat Therapies, Milan, Italy
[3] Reply SPA, Turin, Italy
[4] IRCCS, European Inst Oncol, Clin Trial Off, Milan, Italy
[5] IRCCS, European Inst Oncol, Cent Management Informat Syst & Technol, Milan, Italy
[6] Univ Milano Bicocca, Dept Stat & Quantitat Methods, Milan, Italy
[7] IRCCS, European Inst Oncol, Div Pathol, Milan, Italy
[8] Univ Milan, Dept Oncol & Hemato Oncol, Milan, Italy
来源
关键词
D O I
10.1200/CCI.24.00034
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSEElectronic health records (EHRs) are valuable information repositories that offer insights for enhancing clinical research on breast cancer (BC) using real-world data. The objective of this study was to develop a natural language processing (NLP) model specifically designed to extract structured data from BC pathology reports written in natural language.METHODSDuring the initial phase, the algorithm's development cohort comprised 193 pathology reports from 116 patients with BC from 2012 to 2016. A rule-based NLP algorithm was applied to extract 26 variables for analysis and was compared with the manual extraction of data performed by both a data entry specialist and an oncologist. Following the first approach, the data set was expanded to include 513 reports, and a Named Entity Recognition (NER)-NLP model was trained and evaluated using K-fold cross-validation.RESULTSThe first approach led to a concordance analysis, which revealed an 82.9% agreement between the algorithm and the oncologist, whereas the concordance between the data entry specialist and the oncologist was 90.8%. The second training approach introduced the definition of an NER-NLP model, in which the accuracy showed remarkable potential (97.8%). Notably, the model demonstrated remarkable performance, especially for parameters such as estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and Ki-67 (F1-score 1.0).CONCLUSIONThe present study aligns with the rapidly evolving field of artificial intelligence (AI) applications in oncology, seeking to expedite the development of complex cancer databases and registries. The results of the model are currently undergoing postprocessing procedures to organize the data into tabular structures, facilitating their utilization in real-world clinical and research endeavors. A high-accuracy NLP model was developed to extract structured data from breast cancer pathology reports.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Utilizing Natural Language Processing (NLP) to identify breast cancer associated-lung metastases from pathology reports to delineate characteristics and challenges of this common site of breast cancer recurrence.
    Lopez, Jose Carlos Valentin
    Ho, Alice Y.
    Moy, Beverly
    Isakoff, Steven J.
    Juric, Dejan
    Ellisen, Leif W.
    Peppercorn, Jeffrey M.
    Bardia, Aditya
    Hughes, Kevin S.
    Vidula, Neelima
    JOURNAL OF CLINICAL ONCOLOGY, 2022, 40 (16)
  • [42] Development and Validation of a Natural Language Processing Algorithm to Extract Descriptors of Microbial Keratitis From the Electronic Health Record
    Woodward, Maria A.
    Maganti, Nenita
    Niziol, Leslie M.
    Amin, Sejal
    Hou, Andrew
    Singh, Karandeep
    CORNEA, 2021, 40 (12) : 1548 - 1553
  • [43] Extracting Diagnostic Data from Unstructured Bone Marrow Biopsy Reports of Myeloid Neoplasms Utilizing a Customized Natural Language Processing (NLP) Algorithm
    Kunz, Isaac
    Peddinti, Ananth
    Nguyen, Tina
    Ward, Morgan
    Asay, Alexandra
    Deininger, Michael W.
    Tantravahi, Srinivas K.
    Courdy, Samir
    BLOOD, 2018, 132
  • [44] Development and Validation of an Algorithm to Identify Prostate Cancer Related Mortality in Electronic Medical Records Using Natural Language Processing
    DiBello, Julia R.
    Wallner, Lauren P.
    Zheng, Chengyi
    Yu, Wei
    Li, Bonnie H.
    VanDenEeden, Stephen K.
    Weinmann, Sheila
    Ritzwoller, Debra
    Richert-Boe, Kathryn
    Jacobsen, Stephen J.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2015, 24 : 418 - 419
  • [45] Natural language processing for automated breast cancer recurrence detection and classification in computed tomography reports
    Lee, Jaimie
    Zepeda, Andres
    Arbour, Gregory
    Isaac, Kathryn V.
    Ng, Raymond T.
    Nichol, Alan
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (16)
  • [46] Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing
    Lee, Jaimie J.
    Zepeda, Andres
    Arbour, Gregory
    Isaac, Kathryn V.
    Ng, Raymond T.
    Nichol, Alan M.
    JCO CLINICAL CANCER INFORMATICS, 2024, 8
  • [47] Use of natural language processing on mammography and pathology findings to supplement BIRADS to improve clinical decision making in breast cancer care
    Puppala, M.
    He, T. C.
    Ogunti, R.
    Wong, S. T. C.
    CANCER RESEARCH, 2017, 77
  • [48] Extracting Clinical Features From Dictated Ambulatory Consult Notes Using a Commercially Available Natural Language Processing Tool: Pilot, Retrospective, Cross-Sectional Validation Study
    Petch, Jeremy
    Batt, Jane
    Murray, Joshua
    Mamdani, Muhammad
    JMIR MEDICAL INFORMATICS, 2019, 7 (04) : 69 - 79
  • [49] Extraction of Breast Cancer Biomarker Data from Narrative Clinical Documents Using Natural Language Processing
    He, Jinghua
    Ouyang, Fangqian
    Eckert, George
    Martin, Joel
    Church, Abby
    Knapp, Kristina
    Dexter, Paul
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2017, 26 : 36 - 36
  • [50] A Preliminary Study of Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports Using Natural Language Processing
    Yang, Shuang
    Yang, Xi
    Lyu, Tianchen
    He, Xing
    Braithwaite, Dejana
    Mehta, Hiren J.
    Guo, Yi
    Wu, Yonghui
    Bian, Jiang
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 618 - 619