Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports

被引:0
|
作者
Munzone, Elisabetta [1 ]
Marra, Antonio [2 ]
Comotto, Federico [3 ]
Guercio, Lorenzo [3 ]
Sangalli, Claudia Anna [4 ]
Lo Cascio, Martina [5 ]
Pagan, Eleonora [6 ]
Sangalli, Davide [5 ]
Bigoni, Ilaria [3 ]
Porta, Francesca Maria [7 ]
D'Ercole, Marianna [7 ]
Ritorti, Fabiana [3 ]
Bagnardi, Vincenzo [6 ]
Fusco, Nicola [7 ,8 ]
Curigliano, Giuseppe [2 ,8 ]
机构
[1] IRCCS, European Inst Oncol, Div Med Senol, Milan, Italy
[2] IRCCS, European Inst Oncol, Div Early Drug Dev Innovat Therapies, Milan, Italy
[3] Reply SPA, Turin, Italy
[4] IRCCS, European Inst Oncol, Clin Trial Off, Milan, Italy
[5] IRCCS, European Inst Oncol, Cent Management Informat Syst & Technol, Milan, Italy
[6] Univ Milano Bicocca, Dept Stat & Quantitat Methods, Milan, Italy
[7] IRCCS, European Inst Oncol, Div Pathol, Milan, Italy
[8] Univ Milan, Dept Oncol & Hemato Oncol, Milan, Italy
来源
关键词
D O I
10.1200/CCI.24.00034
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSEElectronic health records (EHRs) are valuable information repositories that offer insights for enhancing clinical research on breast cancer (BC) using real-world data. The objective of this study was to develop a natural language processing (NLP) model specifically designed to extract structured data from BC pathology reports written in natural language.METHODSDuring the initial phase, the algorithm's development cohort comprised 193 pathology reports from 116 patients with BC from 2012 to 2016. A rule-based NLP algorithm was applied to extract 26 variables for analysis and was compared with the manual extraction of data performed by both a data entry specialist and an oncologist. Following the first approach, the data set was expanded to include 513 reports, and a Named Entity Recognition (NER)-NLP model was trained and evaluated using K-fold cross-validation.RESULTSThe first approach led to a concordance analysis, which revealed an 82.9% agreement between the algorithm and the oncologist, whereas the concordance between the data entry specialist and the oncologist was 90.8%. The second training approach introduced the definition of an NER-NLP model, in which the accuracy showed remarkable potential (97.8%). Notably, the model demonstrated remarkable performance, especially for parameters such as estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and Ki-67 (F1-score 1.0).CONCLUSIONThe present study aligns with the rapidly evolving field of artificial intelligence (AI) applications in oncology, seeking to expedite the development of complex cancer databases and registries. The results of the model are currently undergoing postprocessing procedures to organize the data into tabular structures, facilitating their utilization in real-world clinical and research endeavors. A high-accuracy NLP model was developed to extract structured data from breast cancer pathology reports.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Extracting Clinical Information from Free-text of Pathology and Operation Notes via Chinese Natural Language Processing
    Zeng, Qiang
    Zhang, Xiaoyan
    Zhang, Weide
    Li, Zuofeng
    Liu, Lei
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 593 - 597
  • [22] Natural Language Processing in a Clinical Decision Support System for the Identification of Venous Thromboembolism: Algorithm Development and Validation
    Jin, Zhi-Geng
    Zhang, Hui
    Tai, Mei-Hui
    Yang, Ying
    Yao, Yuan
    Guo, Yu-Tao
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
  • [23] Development and validation of natural language processing (NLP) algorithm for detection of distant versus local breast cancer recurrence and metastatic site.
    Karimi, Yasmin
    Blayney, Douglas W.
    Kurian, Allison W.
    Rubin, Daniel
    Banerjee, Imon
    JOURNAL OF CLINICAL ONCOLOGY, 2020, 38 (15)
  • [24] Identifying Cancer Recurrence from the Electronic Medical Record: Using Natural Language Processing to Interpret Pathology Reports
    Bayona, D.
    Ebner, D. K.
    Weiskittle, T. M.
    Talom, B. C. Kamdem
    Kowalchuk, R. O.
    Breen, W.
    Routman, D. M.
    Waddle, M. R.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2024, 120 (02): : E611 - E611
  • [25] Natural Language Processing to extract SNOMED-CT codes from pathological reports
    Cazzaniga, Giorgio
    Eccher, Albino
    Munari, Enrico
    Marletta, Stefano
    Bonoldi, Emanuela
    Della Mea, Vincenzo
    Cadei, Moris
    Sbaraglia, Marta
    Guerriero, Angela
    Dei Tos, Angelo Paolo
    Pagni, Fabio
    L'Imperio, Vincenzo
    PATHOLOGICA, 2023, 115 (06) : 318 - 324
  • [26] Development and Validation of a Natural Language Processing Computer Program to Evaluate the Quality of Colonoscopy Reports
    Harkema, Hendrik
    Bishehsari, Faraz
    Dellon, Evan S.
    Saul, Melissa I.
    Chapman, Wendy
    Farmer, Carrie M.
    Mehrotra, Ateev
    Schoen, Robert E.
    GASTROENTEROLOGY, 2011, 140 (05) : S413 - S413
  • [27] Extracting lung cancer staging descriptors from pathology reports: A generative language model approach
    Cho, Hyeongmin
    Yoo, Sooyoung
    Kim, Borham
    Jang, Sowon
    Sunwoo, Leonard
    Kim, Sanghwan
    Lee, Donghyoung
    Kim, Seok
    Nam, Sejin
    Chung, Jin-Haeng
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 157
  • [28] Natural Language Processing to Abstract Preneoplastic and Incidental Pulmonary Lesions from Pathology Reports
    Petricca, J.
    French, C.
    Ajaj, R.
    Zelifan, A.
    Grant, B.
    Zhan, L.
    Zhang, Y.
    Thakral, A.
    Nicholls, D.
    Hsu, Y. -H. R.
    Pal, P.
    Cabanero, M.
    Tsao, M. S.
    Liu, G.
    JOURNAL OF THORACIC ONCOLOGY, 2022, 17 (09) : S515 - S515
  • [29] Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing
    Pratiksha R. Deshmukh
    Rashmi Phalnikar
    Health and Technology, 2020, 10 : 1555 - 1570
  • [30] Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    HEALTH AND TECHNOLOGY, 2020, 10 (06) : 1555 - 1570