Deep Learning for Natural Language Processing in Urology: State-of-the-Art Automated Extraction of Detailed Pathologic Prostate Cancer Data From Narratively Written Electronic Health Records

被引:29
|
作者
Leyh-Bannurah, Sami-Ramzi [1 ,2 ]
Tian, Zhe [4 ]
Karakiewicz, Pierre, I [4 ]
Wolffgang, Ulrich [3 ]
Sauter, Guido [2 ]
Fisch, Margit [2 ]
Pehrke, Dirk [1 ]
Huland, Hartwig [1 ]
Graefen, Markus [1 ]
Budaeus, Lars [1 ]
机构
[1] Prostate Canc Ctr Hamburg Eppendorf, Hamburg, Germany
[2] Univ Med Ctr Hamburg Eppendorf, Hamburg, Germany
[3] Univ Munster, Munster, Germany
[4] Univ Montreal, Hlth Ctr, Montreal, PQ, Canada
来源
关键词
D O I
10.1200/CCI.18.00080
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). Methods Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. Results Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. Conclusion Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines. Clin Cancer Inform. (C) 2018 by American Society of Clinical Oncology
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [21] Automated Derivation of Diagnostic Criteria for Lung Cancer using Natural Language Processing on Electronic Health Records: A pilot study
    Houston, Andrew
    Williams, Sophie
    Ricketts, William
    Gutteridge, Charles
    Tackaberry, Chris
    Simon, Marcus
    Conibear, John
    LUNG CANCER, 2024, 190
  • [22] Automated derivation of diagnostic criteria for lung cancer using natural language processing on electronic health records: a pilot study
    Houston, Andrew
    Williams, Sophie
    Ricketts, William
    Gutteridge, Charles
    Tackaberry, Chris
    Conibear, John
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [23] Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing
    Sada, Yvonne
    Hou, Jason
    Richardson, Peter
    El-Serag, Hashem
    Davila, Jessica
    MEDICAL CARE, 2016, 54 (02) : E9 - E14
  • [24] A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records
    Annapragada, Akshaya, V
    Donaruma-Kwoh, Marcella M.
    Annapragada, Ananth, V
    Starosolski, Zbigniew A.
    PLOS ONE, 2021, 16 (02):
  • [25] Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results
    Thomas, Anil A.
    Zheng, Chengyi
    Jung, Howard
    Chang, Allen
    Kim, Brian
    Gelfond, Joy
    Slezak, Jeff
    Porter, Kim
    Jacobsen, Steven J.
    Chien, Gary W.
    WORLD JOURNAL OF UROLOGY, 2014, 32 (01) : 99 - 103
  • [26] Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results
    Anil A. Thomas
    Chengyi Zheng
    Howard Jung
    Allen Chang
    Brian Kim
    Joy Gelfond
    Jeff Slezak
    Kim Porter
    Steven J. Jacobsen
    Gary W. Chien
    World Journal of Urology, 2014, 32 : 99 - 103
  • [27] Improving the accuracy of automated gout flare ascertainment using natural language processing of electronic health records and linked Medicare claims data
    Yoshida, Kazuki
    Cai, Tianrun
    Bessette, Lily G.
    Kim, Erin
    Lee, Su Been
    Zabotka, Luke E.
    Sun, Alec
    Mastrorilli, Julianna M.
    Oduol, Theresa A.
    Liu, Jun
    Solomon, Daniel H.
    Kim, Seoyoung C.
    Desai, Rishi J.
    Liao, Katherine P.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2024, 33 (01)
  • [28] Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
    Zhao, Yiqing
    Fu, Sunyang
    Bielinski, Suzette J.
    Decker, Paul A.
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Larson, Nicholas B.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (03)
  • [29] Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records
    Caccamisi, Andrea
    Jorgensen, Leif
    Dalianis, Hercules
    Rosenlund, Mats
    UPSALA JOURNAL OF MEDICAL SCIENCES, 2020, 125 (04) : 316 - 324
  • [30] Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing
    Bae, Ye Seul
    Kim, Kyung Hwan
    Kim, Han Kyul
    Choi, Sae Won
    Ko, Taehoon
    Seo, Hee Hwa
    Lee, Hae-Young
    Jeon, Hyojin
    APPLIED SCIENCES-BASEL, 2021, 11 (19):