Deep Learning for Natural Language Processing in Urology: State-of-the-Art Automated Extraction of Detailed Pathologic Prostate Cancer Data From Narratively Written Electronic Health Records

被引:29
|
作者
Leyh-Bannurah, Sami-Ramzi [1 ,2 ]
Tian, Zhe [4 ]
Karakiewicz, Pierre, I [4 ]
Wolffgang, Ulrich [3 ]
Sauter, Guido [2 ]
Fisch, Margit [2 ]
Pehrke, Dirk [1 ]
Huland, Hartwig [1 ]
Graefen, Markus [1 ]
Budaeus, Lars [1 ]
机构
[1] Prostate Canc Ctr Hamburg Eppendorf, Hamburg, Germany
[2] Univ Med Ctr Hamburg Eppendorf, Hamburg, Germany
[3] Univ Munster, Munster, Germany
[4] Univ Montreal, Hlth Ctr, Montreal, PQ, Canada
来源
关键词
D O I
10.1200/CCI.18.00080
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose Entering all information from narrative documentation for clinical research into databases is time consuming, costly, and nearly impossible. Even high-volume databases do not cover all patient characteristics and drawn results may be limited. A new viable automated solution is machine learning based on deep neural networks applied to natural language processing (NLP), extracting detailed information from narratively written (eg, pathologic radical prostatectomy [RP]) electronic health records (EHRs). Methods Within an RP pathologic database, 3,679 RP EHRs were randomly split into 70% training and 30% test data sets. Training EHRs were automatically annotated, providing a semiautomatically annotated corpus of narratively written pathologic reports with initially context-free gold standard encodings. Primary and secondary Gleason pattern, corresponding percentages, tumor stage, nodal stage, total volume, tumor volume and diameter, and surgical margin were variables of interest. Second, state-of-the-art NLP techniques were used to train an industry-standard language model for pathologic EHRs by transfer learning. Finally, accuracy of the named entity extractors was compared with the gold standard encodings. Results Agreement rates (95% confidence interval) for primary and secondary Gleason patterns each were 91.3% (89.4 to 93.0), corresponding to the following: Gleason percentages, 70.5% (67.6 to 73.3) and 80.9% (78.4 to 83.3); tumor stage, 99.3% (98.6 to 99.7); nodal stage, 98.7% (97.8 to 99.3); total volume, 98.3% (97.3 to 99.0); tumor volume, 93.3% (91.6 to 94.8); maximum diameter, 96.3% (94.9 to 97.3); and surgical margin, 98.7% (97.8 to 99.3). Cumulative agreement was 91.3%. Conclusion Our proposed NLP pipeline offers new abilities for precise and efficient data management from narrative documentation for clinical research. The scalable approach potentially allows the NLP pipeline to be generalized to other genitourinary EHRs, tumor entities, and other medical disciplines. Clin Cancer Inform. (C) 2018 by American Society of Clinical Oncology
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [31] Automated abstraction of real-world clinical outcome in lung cancer: A natural language processing and artificial intelligence approach from electronic health records.
    Ma, Meng
    Redfern, Arielle
    Zhou, Xiang
    Li, Dan
    Ru, Ying
    Lee, Kyeryoung
    Gilman, Christopher
    Liu, Zongzhi
    Jones, Scott
    Mai, Yun
    Deitz, Matthew
    Gong, Yunrou
    Mullaney, Tommy
    Prentice, Tony
    Chen, Rong
    Schadt, Eric
    Wang, Xiaoyan
    JOURNAL OF CLINICAL ONCOLOGY, 2020, 38 (15)
  • [32] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
    Hyung Jun Park
    Namu Park
    Jang Ho Lee
    Myeong Geun Choi
    Jin-Sook Ryu
    Min Song
    Chang-Min Choi
    BMC Medical Informatics and Decision Making, 22
  • [33] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
    Park, Hyung Jun
    Park, Namu
    Lee, Jang Ho
    Choi, Myeong Geun
    Ryu, Jin-Sook
    Song, Min
    Choi, Chang-Min
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [34] Natural Language Processing Tool for Extraction of Patient-Reported Outcomes from a National Multi-Electronic Health Records Registry
    Humbert-Droz, Marie
    Izadi, Zara
    Schmajuk, Gabriela
    Gianfrancesco, Milena
    Yazdany, Jinoos
    Tamang, Suzanne
    ARTHRITIS & RHEUMATOLOGY, 2021, 73 : 3955 - 3957
  • [35] Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data
    Ling, Albee Y.
    Kurian, Allison W.
    Caswell-Jin, Jennifer L.
    Sledge, George W., Jr.
    Shah, Nigam H.
    Tamang, Suzanne R.
    JAMIA OPEN, 2019, 2 (04) : 528 - 537
  • [36] Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing
    Sung, Sheng-Feng
    Sung, Kuan-Lin
    Pan, Ru-Chiou
    Lee, Pei-Ju
    Hu, Ya-Han
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2022, 9
  • [37] Deep and machine learning models to improve risk prediction of cardiovascular disease using data extraction from electronic health records
    Korsakov, I.
    Gusev, A.
    Kuznetsova, T.
    Gavrilov, D.
    Novitskiy, R.
    EUROPEAN HEART JOURNAL, 2019, 40 : 1213 - 1213
  • [38] Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review
    Sim, Jin-Ah
    Huang, Xiaolei
    Horan, Madeline R.
    Baker, Justin N.
    Huang, I-Chan
    EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH, 2024, 24 (04) : 467 - 475
  • [39] Practical use case of natural language processing for observational clinical research data retrieval from electronic health records: AssistMED project
    Maciejewski, Cezary
    Ozieranski, Krzysztof
    Basza, Mikolaj
    Barwiolek, Adam
    Ciurla, Michalina
    Bozym, Aleksandra
    Krajsman, Maciej J.
    Lodzinski, Piotr
    Opolski, Grzegorz
    Grabowski, Marcin
    Cacko, Andrzej
    Balsam, Pawel
    POLISH ARCHIVES OF INTERNAL MEDICINE-POLSKIE ARCHIWUM MEDYCYNY WEWNETRZNEJ, 2024, 134 (05):
  • [40] A Comparison Of Structured Data Query Methods Versus Natural Language Processing To Identify Metastatic Melanoma Cases From Electronic Health Records
    Dexter, Paul R.
    He, Jinghua
    Mark, Lawrence
    Haggstrom, David
    Hilton, Charity
    Martin, Joel
    Baker, Jarod
    Duke, Jon
    Hui, Siu L.
    Li, Xiaochun
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2016, 25 : 253 - 254