Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports

被引:0
|
作者
Bressem, Keno K. [1 ,2 ,3 ,4 ,5 ]
Adams, Lisa C. [1 ,2 ,3 ,4 ,5 ]
Gaudin, Robert A. [6 ]
Troeltzsch, Daniel [6 ]
Hamm, Bernd [1 ]
Makowski, Marcus R. [7 ]
Schuele, Chan-Yong [1 ]
Vahldiek, Janis L. [1 ]
Niehues, Stefan M. [1 ]
机构
[1] Charite, Dept Radiol, D-12203 Berlin, Germany
[2] Charite Univ Med Berlin, D-10117 Berlin, Germany
[3] Free Univ Berlin, D-10117 Berlin, Germany
[4] Humboldt Univ, D-10117 Berlin, Germany
[5] Berlin Inst Hlth, D-10117 Berlin, Germany
[6] Charite, Dept Oral & Maxillofacial Surg, D-12203 Berlin, Germany
[7] Tech Univ Munich, Sch Med, Dept Diagnost & Intervent Radiol, D-81675 Munich, Germany
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. Results Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementation We make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. Supplementary information are available at Bioinformatics online.
引用
收藏
页码:5255 / 5261
页数:7
相关论文
共 50 条
  • [31] A Natural Language Processing and deep learning based model for automated vehicle diagnostics using free-text customer service reports
    Khodadadi, Ali
    Ghandiparsi, Soroush
    Chuah, Chen-Nee
    MACHINE LEARNING WITH APPLICATIONS, 2022, 10
  • [32] Myers-Briggs personality classification from social media text using pre-trained language models
    dos Santos, Vitor Garcia
    Paraboni, Ivandre
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2022, 28 (04) : 378 - 395
  • [33] Robust Sentiment Classification of Metaverse Services Using a Pre-trained Language Model with Soft Voting
    Lee, Haein
    Jung, Hae Sun
    Lee, Seon Hong
    Kim, Jang Hyun
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2023, 17 (09): : 2334 - 2347
  • [34] Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language
    Agbesi, Victor Kwaku
    Chen, Wenyu
    Yussif, Sophyani Banaamwini
    Hossin, Md Altab
    Ukwuoma, Chiagoziem C.
    Kuadey, Noble A.
    Agbesi, Colin Collinson
    Samee, Nagwan Abdel
    Jamjoom, Mona M.
    Al-antari, Mugahed A.
    SYSTEMS, 2024, 12 (01):
  • [35] Chinese-Korean Weibo Sentiment Classification Based on Pre-trained Language Model and Transfer Learning
    Wang, Hengxuan
    Zhang, Zhenguo
    Cui, Xu
    Cui, Rongyi
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 49 - 54
  • [36] Pre-trained language model for code-mixed text in Indonesian, Javanese, and English using transformer
    Ahmad Fathan Hidayatullah
    Rosyzie Anna Apong
    Daphne Teck Ching Lai
    Atika Qazi
    Social Network Analysis and Mining, 15 (1)
  • [37] Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model
    Ghadhab, Lobna
    Jenhani, Ilyes
    Mkaouer, Mohamed Wiem
    Ben Messaoud, Montassar
    INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 135
  • [38] Efficient Federated Learning with Pre-Trained Large Language Model Using Several Adapter Mechanisms
    Kim, Gyunyeop
    Yoo, Joon
    Kang, Sangwoo
    MATHEMATICS, 2023, 11 (21)
  • [39] Automatic White Blood Cell Classification Using Pre-trained Deep Learning Models: ResNet and Inception
    Habibzadeh, Mehdi
    Jannesari, Mahboobeh
    Rezaei, Zahra
    Baharvand, Hossein
    Totonchi, Mehdi
    TENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2017), 2018, 10696
  • [40] Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
    Dalai, Tusarkanta
    Kumarmishra, Tapas
    Sa, Andpankaj K.
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)