Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports

被引:0
|
作者
Bressem, Keno K. [1 ,2 ,3 ,4 ,5 ]
Adams, Lisa C. [1 ,2 ,3 ,4 ,5 ]
Gaudin, Robert A. [6 ]
Troeltzsch, Daniel [6 ]
Hamm, Bernd [1 ]
Makowski, Marcus R. [7 ]
Schuele, Chan-Yong [1 ]
Vahldiek, Janis L. [1 ]
Niehues, Stefan M. [1 ]
机构
[1] Charite, Dept Radiol, D-12203 Berlin, Germany
[2] Charite Univ Med Berlin, D-10117 Berlin, Germany
[3] Free Univ Berlin, D-10117 Berlin, Germany
[4] Humboldt Univ, D-10117 Berlin, Germany
[5] Berlin Inst Hlth, D-10117 Berlin, Germany
[6] Charite, Dept Oral & Maxillofacial Surg, D-12203 Berlin, Germany
[7] Tech Univ Munich, Sch Med, Dept Diagnost & Intervent Radiol, D-81675 Munich, Germany
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. Results Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementation We make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. Supplementary information are available at Bioinformatics online.
引用
收藏
页码:5255 / 5261
页数:7
相关论文
共 50 条
  • [21] LMPred: predicting antimicrobial peptides using pre-trained language models and deep learning
    Dee, William
    Gromiha, Michael
    BIOINFORMATICS ADVANCES, 2022, 2 (01):
  • [22] A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
    Li, Yikuan
    Wang, Hanyin
    Luo, Yuan
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1999 - 2004
  • [23] ZeroAE: Pre-trained Language Model based Autoencoder for Transductive Zero-shot Text Classification
    Guo, Kaihao
    Yu, Hang
    Liao, Cong
    Li, Jianguo
    Zhang, Haipeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3202 - 3219
  • [24] Rule-based Natural Language Processing Approach to Detect Delirium on a Pre-Trained Deep Learning Model Framework
    Munoz, Ricardo
    Hua, Yining
    Seibold, Eva-Lotte
    Ahrens, Elena
    Redaelli, Simone
    Suleiman, Aiman
    von Wedel, Dario
    Ashrafian, Sarah
    Chen, Guanqing
    Schaefer, Maximilian
    Ma, Haobo
    ANESTHESIA AND ANALGESIA, 2023, 136 : 1028 - 1030
  • [25] Abusive and Hate speech Classification in Arabic Text Using Pre-trained Language Models and Data Augmentation
    Badri, Nabil
    Kboubi, Ferihane
    Chaibi, Anja Habacha
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (11)
  • [26] Using a Pre-Trained Language Model for Medical Named Entity Extraction in Chinese Clinic Text
    Zhang, Mengyuan
    Wang, Jin
    Zhang, Xuejie
    PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 312 - 317
  • [27] Pre-trained Deep Learning Models for Chest X-Rays' Classification: Views and Age-Groups
    Farhat, Hanan
    Jabbour, Joey
    Sakr, Georges E.
    Kilany, Rima
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, INTELLISYS 2023, 2024, 823 : 71 - 82
  • [28] Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification
    Aydogan, Murat
    Karci, Ali
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 541
  • [29] Deep learning for natural language processing of free-text pathology reports: a comparison of learning curves
    Senders, Joeky T.
    Cote, David J.
    Mehrtash, Alireza
    Wiemann, Robert
    Gormley, William B.
    Smith, Timothy R.
    Broekman, Marike L. D.
    Arnaout, Omar
    BMJ INNOVATIONS, 2020, 6 (04) : 192 - 198
  • [30] Using Large Pre-Trained Language Model to Assist FDA in Premarket Medical Device Classification
    Xu, Zongzhe
    SOUTHEASTCON 2023, 2023, : 159 - 166