Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning

被引:14
|
作者
Trivedi, Hari M. [1 ]
Panahiazar, Maryam [2 ]
Liang, April [3 ]
Lituiev, Dmytro [2 ]
Chang, Peter [1 ]
Sohn, Jae Ho [1 ]
Chen, Yunn-Yi [4 ]
Franc, Benjamin L. [1 ]
Joe, Bonnie [1 ]
Hadley, Dexter [2 ]
机构
[1] Univ Calif San Francisco, Dept Radiol & Biomed Imaging, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94143 USA
[3] Univ Calif San Francisco, Sch Med, San Francisco, CA USA
[4] Univ Calif San Francisco, Dept Pathol, San Francisco, CA 94140 USA
关键词
IBM Watson; Machine learning; Artificial intelligence; Deep learning; Natural language processing (NLP); Pathology; Mammography; CANCER; CLASSIFICATION; ARCHITECTURE; MAMMOGRAPHY; MASSES;
D O I
10.1007/s10278-018-0105-8
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson's automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson's NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.
引用
收藏
页码:30 / 37
页数:8
相关论文
共 50 条
  • [11] Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records
    Afzal, Zubair
    Schuemie, Martijn J.
    van Blijderveen, Jan C.
    Sen, Elif F.
    Sturkenboom, Miriam C. J. M.
    Kors, Jan A.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2013, 13
  • [12] Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records
    Zubair Afzal
    Martijn J Schuemie
    Jan C van Blijderveen
    Elif F Sen
    Miriam CJM Sturkenboom
    Jan A Kors
    BMC Medical Informatics and Decision Making, 13
  • [14] Semi-Automated, Large-Scale Evaluation of Public Displays
    Makela, Ville
    Heimonen, Tomi
    Turunen, Markku
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2018, 34 (06) : 491 - 505
  • [15] Automated Identification of Clinical Procedures in Free-Text Electronic Clinical Records with a Low-Code Named Entity Recognition Workflow
    Macri, Carmelo
    Teoh, Ian
    Bacchi, Stephen
    Sun, Michelle
    Selva, Dinesh
    Casson, Robert
    Chan, WengOnn
    METHODS OF INFORMATION IN MEDICINE, 2022, 61 (03/04) : 84 - 89
  • [16] Semi-automated segmentation of ONH tissues using deep learning
    Clingo, Kelly A.
    Czerpak, Cameron A.
    Quigley, Harry A.
    Nguyen, Thao D.
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [17] Outlier Detection in Health Record Free-Text using Deep Learning
    Wallace, Duncan
    Kecandi, Tahar
    2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 550 - 555
  • [18] SEMI-AUTOMATED LARGE-SCALE EXPANSION OF INTRAHEPATIC CHOLANGIOCYTE ORGANOIDS
    ten Dam, M.
    Sam, J.
    Ne, E.
    van Uden, L.
    Das, R.
    Spee, B.
    CYTOTHERAPY, 2023, 25 (06) : S149 - S150
  • [19] Semi-automated creation of reciprocal frame structures using deep learning
    Agirbas, Asli
    AUTOMATION IN CONSTRUCTION, 2024, 165
  • [20] Affordable High Throughput Field Detection of Wheat Stripe Rust Using Deep Learning with Semi-Automated Image Labeling
    Tang, Zhou
    Wang, Meinan
    Schirrmann, Michael
    Li, Xianran
    Brueggeman, Robert
    Sankaran, Sindhuja
    Carter, Arron H.
    Pumphrey, Michael O.
    Hu, Yang
    Chen, Xianming
    Zhang, Zhiwu
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 207