Deep Learning for Natural Language Processing in Urology: State-of-the-Art Automated Extraction of Detailed Pathologic Prostate Cancer Data From Narratively Written Electronic Health Records

被引:29
|
作者
Leyh-Bannurah, Sami-Ramzi [1 ,2 ]
Tian, Zhe [4 ]
Karakiewicz, Pierre, I [4 ]
Wolffgang, Ulrich [3 ]
Sauter, Guido [2 ]
Fisch, Margit [2 ]
Pehrke, Dirk [1 ]
Huland, Hartwig [1 ]
Graefen, Markus [1 ]
Budaeus, Lars [1 ]
机构
[1] Prostate Canc Ctr Hamburg Eppendorf, Hamburg, Germany
[2] Univ Med Ctr Hamburg Eppendorf, Hamburg, Germany
[3] Univ Munster, Munster, Germany
[4] Univ Montreal, Hlth Ctr, Montreal, PQ, Canada
来源
关键词
D O I
10.1200/CCI.18.00080
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). Methods Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. Results Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. Conclusion Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines. Clin Cancer Inform. (C) 2018 by American Society of Clinical Oncology
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [1] STATE-OF-THE-ART AUTOMATED EXTRACTION OF DETAILED PATHOLOGICAL DATA FROM NARRATIVELY WRITTEN ELECTRONIC HEALTH RECORDS
    Leyh-Bannurah, Sami-Ramzi
    Zhe, Tian
    Karakiewicz, Pierre
    Wolfgang, Ulrich
    Pehrke, Dirk
    Fisch, Margit
    Huland, Hartwig
    Graefen, Markus
    Budaeus, Lars
    JOURNAL OF UROLOGY, 2018, 199 (04): : E934 - E934
  • [2] DEEP LEARNING IN NATURAL LANGUAGE PROCESSING: A STATE-OF-THE-ART SURVEY
    Chai, Junyi
    Li, Anming
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2019, : 535 - 540
  • [3] A NEW ERA: AUTOMATED EXTRACTION OF DETAILED PROSTATE CANCER INFORMATION FROM NARRATIVELY WRITTEN HEALTH RECORDS. PIONEER WORK FROM A EUROPEAN TERTIARY CARE CENTER
    Leyh-Bannurah, Sami-Ramzi
    Tian, Zhe
    Karakiewicz, Pierre
    Pehrke, Dirk
    Huland, Hartwig
    Graefen, Markus
    Budaeus, Lars
    JOURNAL OF UROLOGY, 2017, 197 (04): : E676 - E676
  • [4] Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records
    Kim, Yoojoong
    Lee, Jeong Hyeon
    Choi, Sunho
    Lee, Jeong Moon
    Kim, Jong-Ho
    Seok, Junhee
    Joo, Hyung Joon
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [5] Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records
    Yoojoong Kim
    Jeong Hyeon Lee
    Sunho Choi
    Jeong Moon Lee
    Jong-Ho Kim
    Junhee Seok
    Hyung Joon Joo
    Scientific Reports, 10
  • [6] Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing
    Fernandes, Marta
    Westover, M. Brandon
    Singhal, Aneesh B.
    Zafar, Sahar F.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2024, 13 (21):
  • [7] RETRACTED ARTICLE: Analysis of Electronic Health Records Based on Deep Learning with Natural Language Processing
    Yi-Cheng Shen
    Te-Chun Hsia
    Ching-Hsien Hsu
    Arabian Journal for Science and Engineering, 2023, 48 : 2597 - 2597
  • [8] Application of Natural Language Processing in Electronic Health Record Data Extraction for Navigating Prostate Cancer Care: A Narrative Review
    Bhatia, Ansh
    Titus, Renil
    Porto, Joao G.
    Katz, Jonathan
    Lopategui, Diana M.
    Marcovich, Robert
    Parekh, Dipen J.
    Shah, Hemendra N.
    JOURNAL OF ENDOUROLOGY, 2024,
  • [9] Deep Learning Approaches for Predicting Glaucoma Progression Using Electronic Health Records and Natural Language Processing
    Wang, Sophia Y.
    Tseng, Benjamin
    Hernandez-Boussard, Tina
    OPHTHALMOLOGY SCIENCE, 2022, 2 (02):
  • [10] Deep Learning-Based Natural Language Processing to Automate Esophagitis Severity Grading from the Electronic Health Records
    Chen, S.
    Guevara, M.
    Ramirez, N.
    Aerts, H.
    Miller, T. A.
    Savova, G. K.
    Mak, R. H.
    Bitterman, D. S.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2023, 117 (02): : S18 - S18