Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports

被引:0
|
作者
Bressem, Keno K. [1 ,2 ,3 ,4 ,5 ]
Adams, Lisa C. [1 ,2 ,3 ,4 ,5 ]
Gaudin, Robert A. [6 ]
Troeltzsch, Daniel [6 ]
Hamm, Bernd [1 ]
Makowski, Marcus R. [7 ]
Schuele, Chan-Yong [1 ]
Vahldiek, Janis L. [1 ]
Niehues, Stefan M. [1 ]
机构
[1] Charite, Dept Radiol, D-12203 Berlin, Germany
[2] Charite Univ Med Berlin, D-10117 Berlin, Germany
[3] Free Univ Berlin, D-10117 Berlin, Germany
[4] Humboldt Univ, D-10117 Berlin, Germany
[5] Berlin Inst Hlth, D-10117 Berlin, Germany
[6] Charite, Dept Oral & Maxillofacial Surg, D-12203 Berlin, Germany
[7] Tech Univ Munich, Sch Med, Dept Diagnost & Intervent Radiol, D-81675 Munich, Germany
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. Results Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementation We make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. Supplementary information are available at Bioinformatics online.
引用
收藏
页码:5255 / 5261
页数:7
相关论文
共 50 条
  • [1] Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports
    Bressem, Keno K.
    Adams, Lisa C.
    Gaudin, Robert A.
    Troeltzsch, Daniel
    Hamm, Bernd
    Makowski, Marcus R.
    Schuele, Chan-Yong
    Vahldiek, Janis L.
    Niehues, Stefan M.
    BIOINFORMATICS, 2020, 36 (21) : 5255 - 5261
  • [2] A survey of text classification based on pre-trained language model
    Wu, Yujia
    Wan, Jun
    NEUROCOMPUTING, 2025, 616
  • [3] Comparison of Pre-trained Word Vectors for Arabic Text Classification using Deep Learning Approach
    Alwehaibi, Ali
    Roy, Kaushik
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1471 - 1474
  • [4] Better Few-Shot Text Classification with Pre-trained Language Model
    Chen, Zheng
    Zhang, Yunchen
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 537 - 548
  • [5] Pre-Trained Language Model-Based Deep Learning for Sentiment Classification of Vietnamese Feedback
    Loc, Cu Vinh
    Viet, Truong Xuan
    Viet, Tran Hoang
    Thao, Le Hoang
    Viet, Nguyen Hoang
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2023, 22 (03)
  • [6] GEML: a graph-enhanced pre-trained language model framework for text classification via mutual learning
    Yu, Tao
    Song, Rui
    Pinto, Sandro
    Gomes, Tiago
    Tavares, Adriano
    Xu, Hao
    APPLIED INTELLIGENCE, 2024, 54 (23) : 12215 - 12229
  • [7] Kurdish Sign Language Recognition Using Pre-Trained Deep Learning Models
    Alsaud, Ali A.
    Yousif, Raghad Z.
    Aziz, Marwan. M.
    Kareem, Shahab W.
    Maho, Amer J.
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (06) : 1334 - 1344
  • [8] Automatic TNM staging of colorectal cancer radiology reports using pre-trained language models
    Chizhikova, Mariia
    Lopez-ubeda, Pilar
    Martin-Noguerol, Teodoro
    Diaz-Galiano, Manuel C.
    Urena-Lopez, L. Alfonso
    Luna, Antonio
    Martin-Valdivia, M. Teresa
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 259
  • [9] Tomato crop disease classification using pre-trained deep learning algorithm
    Rangarajan, Aravind Krishnaswamy
    Purushothaman, Raja
    Ramesh, Aniirudh
    INTERNATIONAL CONFERENCE ON ROBOTICS AND SMART MANUFACTURING (ROSMA2018), 2018, 133 : 1040 - 1047
  • [10] Hockey activity recognition using pre-trained deep learning model
    Rangasamy, Keerthana
    As'ari, Muhammad Amir
    Rahmad, Nur Azmina
    Ghazali, Nurul Fathiah
    ICT EXPRESS, 2020, 6 (03): : 170 - 174