Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning

被引:14
|
作者
Trivedi, Hari M. [1 ]
Panahiazar, Maryam [2 ]
Liang, April [3 ]
Lituiev, Dmytro [2 ]
Chang, Peter [1 ]
Sohn, Jae Ho [1 ]
Chen, Yunn-Yi [4 ]
Franc, Benjamin L. [1 ]
Joe, Bonnie [1 ]
Hadley, Dexter [2 ]
机构
[1] Univ Calif San Francisco, Dept Radiol & Biomed Imaging, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Inst Computat Hlth Sci, San Francisco, CA 94143 USA
[3] Univ Calif San Francisco, Sch Med, San Francisco, CA USA
[4] Univ Calif San Francisco, Dept Pathol, San Francisco, CA 94140 USA
关键词
IBM Watson; Machine learning; Artificial intelligence; Deep learning; Natural language processing (NLP); Pathology; Mammography; CANCER; CLASSIFICATION; ARCHITECTURE; MAMMOGRAPHY; MASSES;
D O I
10.1007/s10278-018-0105-8
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson's automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson's NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.
引用
收藏
页码:30 / 37
页数:8
相关论文
共 50 条
  • [21] From free-text to structured safety management: Introduction of a semi-automated classification method of railway hazard reports to elements on a bow-tie diagram
    Hughes, Peter
    Shipp, David
    Figueres-Esteban, Miguel
    van Gulijk, Coen
    SAFETY SCIENCE, 2018, 110 : 11 - 19
  • [22] An automated algorithm using free-text clinical notes to improve identification of transgender people
    Xie, Fagen
    Getahun, Darios
    Quinn, Virginia P.
    Im, Theresa M.
    Contreras, Richard
    Silverberg, Michael J.
    Baird, Tisha C.
    Nash, Rebecca
    Cromwell, Lee
    Roblin, Douglas
    Hoffman, Trenton
    Goodman, Michael
    INFORMATICS FOR HEALTH & SOCIAL CARE, 2021, 46 (01): : 18 - 28
  • [23] Deep learning for natural language processing of free-text pathology reports: a comparison of learning curves
    Senders, Joeky T.
    Cote, David J.
    Mehrtash, Alireza
    Wiemann, Robert
    Gormley, William B.
    Smith, Timothy R.
    Broekman, Marike L. D.
    Arnaout, Omar
    BMJ INNOVATIONS, 2020, 6 (04) : 192 - 198
  • [24] Deep learning for semi-automated unidirectional measurement of lung tumor size in CT
    Woo, MinJae
    Devane, A. Michael
    Lowe, Steven C.
    Lowther, Ervin L.
    Gimbel, Ronald W.
    CANCER IMAGING, 2021, 21 (01)
  • [25] Geographic Atrophy Enlargement Using Manual, Semi-Automated, and Deep Learning Approaches
    Bogost, Jacob
    Safai, Apoorva
    Linderman, Rachel E.
    Slater, Robert
    Voland, Rick
    Pak, Jeong W.
    Fong, Donald S.
    Blodi, Barbara A.
    Domalpally, Amitha
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [26] A Natural Language Processing and deep learning based model for automated vehicle diagnostics using free-text customer service reports
    Khodadadi, Ali
    Ghandiparsi, Soroush
    Chuah, Chen-Nee
    MACHINE LEARNING WITH APPLICATIONS, 2022, 10
  • [27] Deep learning for semi-automated unidirectional measurement of lung tumor size in CT
    MinJae Woo
    A. Michael Devane
    Steven C. Lowe
    Ervin L Lowther
    Ronald W. Gimbel
    Cancer Imaging, 21
  • [28] A Deep Learning Method for ICD-10 Coding of Free-Text Death Certificates
    Duarte, Francisco
    Martins, Bruno
    Pinto, Catia Sousa
    Silva, Mario J.
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017), 2017, 10423 : 137 - 149
  • [29] Will artificial intelligence revolutionize aerial surveys? A first large-scale semi-automated survey of African wildlife using oblique imagery and deep learning
    Delplanque, Alexandre
    Linchant, Julie
    Vincke, Xavier
    Lamprey, Richard
    Theau, Jerome
    Vermeulen, Cedric
    Foucher, Samuel
    Ouattara, Amara
    Kouadio, Roger
    Lejeune, Philippe
    ECOLOGICAL INFORMATICS, 2024, 82
  • [30] Semi-Supervised Learning in Large Scale Text Categorization
    许泽文
    李建强
    刘博
    毕敬
    李蓉
    毛睿
    Journal of Shanghai Jiaotong University(Science), 2017, 22 (03) : 291 - 302