Deep Learning for Natural Language Processing in Urology: State-of-the-Art Automated Extraction of Detailed Pathologic Prostate Cancer Data From Narratively Written Electronic Health Records

被引:29
|
作者
Leyh-Bannurah, Sami-Ramzi [1 ,2 ]
Tian, Zhe [4 ]
Karakiewicz, Pierre, I [4 ]
Wolffgang, Ulrich [3 ]
Sauter, Guido [2 ]
Fisch, Margit [2 ]
Pehrke, Dirk [1 ]
Huland, Hartwig [1 ]
Graefen, Markus [1 ]
Budaeus, Lars [1 ]
机构
[1] Prostate Canc Ctr Hamburg Eppendorf, Hamburg, Germany
[2] Univ Med Ctr Hamburg Eppendorf, Hamburg, Germany
[3] Univ Munster, Munster, Germany
[4] Univ Montreal, Hlth Ctr, Montreal, PQ, Canada
来源
关键词
D O I
10.1200/CCI.18.00080
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). Methods Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. Results Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. Conclusion Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines. Clin Cancer Inform. (C) 2018 by American Society of Clinical Oncology
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [41] Application of Natural Language Processing with Machine Learning Techniques to Analyze Unstructured Patient-Reported Outcomes Data in Electronic Health Records: A Systematic Review
    Sim, Jin-ah
    Huang, Xiaolei
    Horan, Madeline R.
    Baker, Justin N.
    Huang, I-Chan
    QUALITY OF LIFE RESEARCH, 2022, 31 : S62 - S63
  • [42] Ascertaining provider-level implicit bias in electronic health records with rules-based natural language processing: A pilot study in the case of prostate cancer
    Ramaswamy, Ashwin
    Hung, Michael
    Pelt, Joe
    Iranmahboub, Parsa
    Calderon, Lina P.
    Scherr, Ian S.
    Wang, Gerald
    Green, David
    Patel, Neal
    McClure, Timothy D.
    Barbieri, Christopher
    Hu, Jim C.
    Lindvall, Charlotta
    Scherr, Douglas S.
    PLOS ONE, 2024, 19 (12):
  • [43] Development of a Natural Language Processing (NLP) model to automatically extract clinical data from electronic health records: results from an Italian comprehensive stroke center
    Badalotti, Davide
    Agrawal, Akanksha
    Pensato, Umberto
    Angelotti, Giovanni
    Marcheselli, Simona
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 192
  • [44] Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review
    Sim, Jin-ah
    Huang, Xiaolei
    Horan, Madeline R.
    Stewart, Christopher M.
    Robison, Leslie L.
    Hudson, Melissa M.
    Baker, Justin N.
    Huang, I-Chan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 146
  • [45] A hybrid modelling approach for abstracting CT imaging indications by integrating natural language processing from radiology reports with structured data from electronic health records.
    Khan, Aparajita
    Wu, Julie
    Choi, Eunji
    Graber-Naidich, Anna
    Henry, Solomon
    Wakelee, Heather A.
    Kurian, Allison W.
    Liang, Su-Ying
    Leung, Ann
    Langlotz, Curtis
    Backhus, Leah M.
    Han, Summer S.
    CANCER PREVENTION RESEARCH, 2023, 16 (01)
  • [46] STATE-OF-THE-ART WEAKLY SUPERVISED AUTOMATED CLASSIFICATION OF PROSTATE CANCER TISSUE MICROARRAYS VIA DEEP LEARNING: CAN SUFFICIENT ACCURACY BE ACHIEVED WITHOUT MANUAL PATCH LEVEL ANNOTATION?
    Leyh-Bannurah, Sami-Ramzi
    Wolffgang, Ulrich
    Schmitz, Jonathan
    Ouellet, Veronique
    Azzi, Feryel
    Tian, Zhe
    Helmke, Burkhard
    Graefen, Markus
    Budaeus, Lars
    Karakiewicz, Pierre I.
    Trudel, Dominique
    Saad, Fred
    JOURNAL OF UROLOGY, 2020, 203 : E306 - E306
  • [47] Information Extraction From Electronic Health Records to Predict Readmission Following Acute Myocardial Infarction: Does Natural Language Processing Using Clinical Notes Improve Prediction of Readmission?
    Brown, Jeremiah R.
    Ricket, Iben M.
    Reeves, Ruth M.
    Shah, Rashmee U.
    Goodrich, Christine A.
    Gobbel, Glen
    Stabler, Meagan E.
    Perkins, Amy M.
    Minter, Freneka
    Cox, Kevin C.
    Dorn, Chad
    Denton, Jason
    Bray, Bruce E.
    Gouripeddi, Ramkiran
    Higgins, John
    Chapman, Wendy W.
    MacKenzie, Todd
    Matheny, Michael E.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2022, 11 (07):
  • [48] Beyond electronic health record data: leveraging natural language processing and machine learning to uncover cognitive insights from patient-nurse verbal communications
    Zolnoori, Maryam
    Zolnour, Ali
    Vergez, Sasha
    Sridharan, Sridevi
    Spens, Ian
    Topaz, Maxim
    Noble, James M.
    Bakken, Suzanne
    Hirschberg, Julia
    Bowles, Kathryn
    Onorato, Nicole
    Mcdonald, Margaret, V
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 32 (02) : 328 - 340
  • [49] HARNESSING FULL TEXT PATHOLOGY DATA FROM THE ELECTRONIC HEALTH RECORD TO ADVANCE BLADDER CANCER CARE - DEVELOPMENT OF A NATURAL LANGUAGE PROCESSING SYSTEM TO GENERATE LONGITUDINAL PATHOLOGY DATA
    Schroeck, Florian
    Patterson, Olga
    Alba, Patrick
    DuVall, Scott
    Sirovich, Brenda
    Robertson, Douglas
    Seigne, John
    Goodney, Philip
    JOURNAL OF UROLOGY, 2017, 197 (04): : E413 - E413
  • [50] ORACULUM: A retrospective observational epidemiological study using artificial intelligence and natural language processing in electronic health records to characterize the prostate cancer pathway, management and outcomes in Europe, Middle East and Africa (EMEA region)
    Carles, J.
    Alcaraz, A.
    Clarke, N. W.
    Conde, A.
    Heidenreich, A.
    Juarez, A.
    Maroto Rey, J. P.
    Puente, J.
    Hernandez-Medrano, I.
    Salcedo, I.
    Haddad, I.
    Munoz del Toro, J.
    Servan, A.
    Pissart, G.
    Casariego, J.
    Fizazi, K.
    ANNALS OF ONCOLOGY, 2020, 31 : S549 - S549