Deep Learning for Natural Language Processing in Urology: State-of-the-Art Automated Extraction of Detailed Pathologic Prostate Cancer Data From Narratively Written Electronic Health Records

被引：29

作者：

Leyh-Bannurah, Sami-Ramzi ^{[1
,2
]}

Tian, Zhe ^{[4
]}

Karakiewicz, Pierre, I ^{[4
]}

Wolffgang, Ulrich ^{[3
]}

Sauter, Guido ^{[2
]}

Fisch, Margit ^{[2
]}

Pehrke, Dirk ^{[1
]}

Huland, Hartwig ^{[1
]}

Graefen, Markus ^{[1
]}

Budaeus, Lars ^{[1
]}

机构：

[1] Prostate Canc Ctr Hamburg Eppendorf, Hamburg, Germany

[2] Univ Med Ctr Hamburg Eppendorf, Hamburg, Germany

[3] Univ Munster, Munster, Germany

[4] Univ Montreal, Hlth Ctr, Montreal, PQ, Canada

来源：

JCO CLINICAL CANCER INFORMATICS | 2018年 / 2卷

关键词：

D O I：

10.1200/CCI.18.00080

中图分类号：

R73 [肿瘤学];

学科分类号：

100214 ;

摘要：

Purpose Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). Methods Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. Results Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. Conclusion Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines. Clin Cancer Inform. (C) 2018 by American Society of Clinical Oncology

引用

页码：1 / 9

页数：9

共 50 条

[31] Automated abstraction of real-world clinical outcome in lung cancer: A natural language processing and artificial intelligence approach from electronic health records.
Ma, Meng
Redfern, Arielle
Zhou, Xiang
Li, Dan
Ru, Ying
Lee, Kyeryoung
Gilman, Christopher
Liu, Zongzhi
Jones, Scott
Mai, Yun
Deitz, Matthew
Gong, Yunrou
Mullaney, Tommy
Prentice, Tony
Chen, Rong
Schadt, Eric
Wang, Xiaoyan
JOURNAL OF CLINICAL ONCOLOGY, 2020, 38 (15)
[32] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
Hyung Jun Park
Namu Park
Jang Ho Lee
Myeong Geun Choi
Jin-Sook Ryu
Min Song
Chang-Min Choi
BMC Medical Informatics and Decision Making, 22
[33] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
Park, Hyung Jun
Park, Namu
Lee, Jang Ho
Choi, Myeong Geun
Ryu, Jin-Sook
Song, Min
Choi, Chang-Min
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
[34] Natural Language Processing Tool for Extraction of Patient-Reported Outcomes from a National Multi-Electronic Health Records Registry
Humbert-Droz, Marie
Izadi, Zara
Schmajuk, Gabriela
Gianfrancesco, Milena
Yazdany, Jinoos
Tamang, Suzanne
ARTHRITIS & RHEUMATOLOGY, 2021, 73 : 3955 - 3957
[35] Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data
Ling, Albee Y.
Kurian, Allison W.
Caswell-Jin, Jennifer L.
Sledge, George W., Jr.
Shah, Nigam H.
Tamang, Suzanne R.
JAMIA OPEN, 2019, 2 (04) : 528 - 537
[36] Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing
Sung, Sheng-Feng
Sung, Kuan-Lin
Pan, Ru-Chiou
Lee, Pei-Ju
Hu, Ya-Han
FRONTIERS IN CARDIOVASCULAR MEDICINE, 2022, 9
[37] Deep and machine learning models to improve risk prediction of cardiovascular disease using data extraction from electronic health records
Korsakov, I.
Gusev, A.
Kuznetsova, T.
Gavrilov, D.
Novitskiy, R.
EUROPEAN HEART JOURNAL, 2019, 40 : 1213 - 1213
[38] Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review
Sim, Jin-Ah
Huang, Xiaolei
Horan, Madeline R.
Baker, Justin N.
Huang, I-Chan
EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH, 2024, 24 (04) : 467 - 475
[39] Practical use case of natural language processing for observational clinical research data retrieval from electronic health records: AssistMED project
Maciejewski, Cezary
Ozieranski, Krzysztof
Basza, Mikolaj
Barwiolek, Adam
Ciurla, Michalina
Bozym, Aleksandra
Krajsman, Maciej J.
Lodzinski, Piotr
Opolski, Grzegorz
Grabowski, Marcin
Cacko, Andrzej
Balsam, Pawel
POLISH ARCHIVES OF INTERNAL MEDICINE-POLSKIE ARCHIWUM MEDYCYNY WEWNETRZNEJ, 2024, 134 (05):
[40] A Comparison Of Structured Data Query Methods Versus Natural Language Processing To Identify Metastatic Melanoma Cases From Electronic Health Records
Dexter, Paul R.
He, Jinghua
Mark, Lawrence
Haggstrom, David
Hilton, Charity
Martin, Joel
Baker, Jarod
Duke, Jon
Hui, Siu L.
Li, Xiaochun
PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2016, 25 : 253 - 254

← 1 2 3 4 5 →