Deep Learning for Natural Language Processing in Urology: State-of-the-Art Automated Extraction of Detailed Pathologic Prostate Cancer Data From Narratively Written Electronic Health Records

被引：29

作者：

Leyh-Bannurah, Sami-Ramzi ^{[1
,2
]}

Tian, Zhe ^{[4
]}

Karakiewicz, Pierre, I ^{[4
]}

Wolffgang, Ulrich ^{[3
]}

Sauter, Guido ^{[2
]}

Fisch, Margit ^{[2
]}

Pehrke, Dirk ^{[1
]}

Huland, Hartwig ^{[1
]}

Graefen, Markus ^{[1
]}

Budaeus, Lars ^{[1
]}

机构：

[1] Prostate Canc Ctr Hamburg Eppendorf, Hamburg, Germany

[2] Univ Med Ctr Hamburg Eppendorf, Hamburg, Germany

[3] Univ Munster, Munster, Germany

[4] Univ Montreal, Hlth Ctr, Montreal, PQ, Canada

来源：

JCO CLINICAL CANCER INFORMATICS | 2018年 / 2卷

关键词：

D O I：

10.1200/CCI.18.00080

中图分类号：

R73 [肿瘤学];

学科分类号：

100214 ;

摘要：

Purpose Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). Methods Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. Results Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. Conclusion Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines. Clin Cancer Inform. (C) 2018 by American Society of Clinical Oncology

引用

页码：1 / 9

页数：9

共 50 条

[1] STATE-OF-THE-ART AUTOMATED EXTRACTION OF DETAILED PATHOLOGICAL DATA FROM NARRATIVELY WRITTEN ELECTRONIC HEALTH RECORDS
Leyh-Bannurah, Sami-Ramzi
Zhe, Tian
Karakiewicz, Pierre
Wolfgang, Ulrich
Pehrke, Dirk
Fisch, Margit
Huland, Hartwig
Graefen, Markus
Budaeus, Lars
JOURNAL OF UROLOGY, 2018, 199 (04): : E934 - E934
[2] DEEP LEARNING IN NATURAL LANGUAGE PROCESSING: A STATE-OF-THE-ART SURVEY
Chai, Junyi
Li, Anming
PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2019, : 535 - 540
[3] A NEW ERA: AUTOMATED EXTRACTION OF DETAILED PROSTATE CANCER INFORMATION FROM NARRATIVELY WRITTEN HEALTH RECORDS. PIONEER WORK FROM A EUROPEAN TERTIARY CARE CENTER
Leyh-Bannurah, Sami-Ramzi
Tian, Zhe
Karakiewicz, Pierre
Pehrke, Dirk
Huland, Hartwig
Graefen, Markus
Budaeus, Lars
JOURNAL OF UROLOGY, 2017, 197 (04): : E676 - E676
[4] Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records
Kim, Yoojoong
Lee, Jeong Hyeon
Choi, Sunho
Lee, Jeong Moon
Kim, Jong-Ho
Seok, Junhee
Joo, Hyung Joon
SCIENTIFIC REPORTS, 2020, 10 (01)
[5] Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records
Yoojoong Kim
Jeong Hyeon Lee
Sunho Choi
Jeong Moon Lee
Jong-Ho Kim
Junhee Seok
Hyung Joon Joo
Scientific Reports, 10
[6] Automated Extraction of Stroke Severity From Unstructured Electronic Health Records Using Natural Language Processing
Fernandes, Marta
Westover, M. Brandon
Singhal, Aneesh B.
Zafar, Sahar F.
JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2024, 13 (21):
[7] RETRACTED ARTICLE: Analysis of Electronic Health Records Based on Deep Learning with Natural Language Processing
Yi-Cheng Shen
Te-Chun Hsia
Ching-Hsien Hsu
Arabian Journal for Science and Engineering, 2023, 48 : 2597 - 2597
[8] Application of Natural Language Processing in Electronic Health Record Data Extraction for Navigating Prostate Cancer Care: A Narrative Review
Bhatia, Ansh
Titus, Renil
Porto, Joao G.
Katz, Jonathan
Lopategui, Diana M.
Marcovich, Robert
Parekh, Dipen J.
Shah, Hemendra N.
JOURNAL OF ENDOUROLOGY, 2024,
[9] Deep Learning Approaches for Predicting Glaucoma Progression Using Electronic Health Records and Natural Language Processing
Wang, Sophia Y.
Tseng, Benjamin
Hernandez-Boussard, Tina
OPHTHALMOLOGY SCIENCE, 2022, 2 (02):
[10] Deep Learning-Based Natural Language Processing to Automate Esophagitis Severity Grading from the Electronic Health Records
Chen, S.
Guevara, M.
Ramirez, N.
Aerts, H.
Miller, T. A.
Savova, G. K.
Mak, R. H.
Bitterman, D. S.
INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2023, 117 (02): : S18 - S18

← 1 2 3 4 5 →