Deep Learning for Natural Language Processing in Urology: State-of-the-Art Automated Extraction of Detailed Pathologic Prostate Cancer Data From Narratively Written Electronic Health Records

被引：29

作者：

Leyh-Bannurah, Sami-Ramzi ^{[1
,2
]}

Tian, Zhe ^{[4
]}

Karakiewicz, Pierre, I ^{[4
]}

Wolffgang, Ulrich ^{[3
]}

Sauter, Guido ^{[2
]}

Fisch, Margit ^{[2
]}

Pehrke, Dirk ^{[1
]}

Huland, Hartwig ^{[1
]}

Graefen, Markus ^{[1
]}

Budaeus, Lars ^{[1
]}

机构：

[1] Prostate Canc Ctr Hamburg Eppendorf, Hamburg, Germany

[2] Univ Med Ctr Hamburg Eppendorf, Hamburg, Germany

[3] Univ Munster, Munster, Germany

[4] Univ Montreal, Hlth Ctr, Montreal, PQ, Canada

来源：

JCO CLINICAL CANCER INFORMATICS | 2018年 / 2卷

关键词：

D O I：

10.1200/CCI.18.00080

中图分类号：

R73 [肿瘤学];

学科分类号：

100214 ;

摘要：

Purpose Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). Methods Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. Results Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. Conclusion Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines. Clin Cancer Inform. (C) 2018 by American Society of Clinical Oncology

引用

页码：1 / 9

页数：9

共 50 条

[21] Automated Derivation of Diagnostic Criteria for Lung Cancer using Natural Language Processing on Electronic Health Records: A pilot study
Houston, Andrew
Williams, Sophie
Ricketts, William
Gutteridge, Charles
Tackaberry, Chris
Simon, Marcus
Conibear, John
LUNG CANCER, 2024, 190
[22] Automated derivation of diagnostic criteria for lung cancer using natural language processing on electronic health records: a pilot study
Houston, Andrew
Williams, Sophie
Ricketts, William
Gutteridge, Charles
Tackaberry, Chris
Conibear, John
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
[23] Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing
Sada, Yvonne
Hou, Jason
Richardson, Peter
El-Serag, Hashem
Davila, Jessica
MEDICAL CARE, 2016, 54 (02) : E9 - E14
[24] A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records
Annapragada, Akshaya, V
Donaruma-Kwoh, Marcella M.
Annapragada, Ananth, V
Starosolski, Zbigniew A.
PLOS ONE, 2021, 16 (02):
[25] Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results
Thomas, Anil A.
Zheng, Chengyi
Jung, Howard
Chang, Allen
Kim, Brian
Gelfond, Joy
Slezak, Jeff
Porter, Kim
Jacobsen, Steven J.
Chien, Gary W.
WORLD JOURNAL OF UROLOGY, 2014, 32 (01) : 99 - 103
[26] Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results
Anil A. Thomas
Chengyi Zheng
Howard Jung
Allen Chang
Brian Kim
Joy Gelfond
Jeff Slezak
Kim Porter
Steven J. Jacobsen
Gary W. Chien
World Journal of Urology, 2014, 32 : 99 - 103
[27] Improving the accuracy of automated gout flare ascertainment using natural language processing of electronic health records and linked Medicare claims data
Yoshida, Kazuki
Cai, Tianrun
Bessette, Lily G.
Kim, Erin
Lee, Su Been
Zabotka, Luke E.
Sun, Alec
Mastrorilli, Julianna M.
Oduol, Theresa A.
Liu, Jun
Solomon, Daniel H.
Kim, Seoyoung C.
Desai, Rishi J.
Liao, Katherine P.
PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2024, 33 (01)
[28] Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
Zhao, Yiqing
Fu, Sunyang
Bielinski, Suzette J.
Decker, Paul A.
Chamberlain, Alanna M.
Roger, Veronique L.
Liu, Hongfang
Larson, Nicholas B.
JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (03)
[29] Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records
Caccamisi, Andrea
Jorgensen, Leif
Dalianis, Hercules
Rosenlund, Mats
UPSALA JOURNAL OF MEDICAL SCIENCES, 2020, 125 (04) : 316 - 324
[30] Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing
Bae, Ye Seul
Kim, Kyung Hwan
Kim, Han Kyul
Choi, Sae Won
Ko, Taehoon
Seo, Hee Hwa
Lee, Hae-Young
Jeon, Hyojin
APPLIED SCIENCES-BASEL, 2021, 11 (19):

← 1 2 3 4 5 →