Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports

被引：0

作者：

Bressem, Keno K. ^{[1
,2
,3
,4
,5
]}

Adams, Lisa C. ^{[1
,2
,3
,4
,5
]}

Gaudin, Robert A. ^{[6
]}

Troeltzsch, Daniel ^{[6
]}

Hamm, Bernd ^{[1
]}

Makowski, Marcus R. ^{[7
]}

Schuele, Chan-Yong ^{[1
]}

Vahldiek, Janis L. ^{[1
]}

Niehues, Stefan M. ^{[1
]}

机构：

[1] Charite, Dept Radiol, D-12203 Berlin, Germany

[2] Charite Univ Med Berlin, D-10117 Berlin, Germany

[3] Free Univ Berlin, D-10117 Berlin, Germany

[4] Humboldt Univ, D-10117 Berlin, Germany

[5] Berlin Inst Hlth, D-10117 Berlin, Germany

[6] Charite, Dept Oral & Maxillofacial Surg, D-12203 Berlin, Germany

[7] Tech Univ Munich, Sch Med, Dept Diagnost & Intervent Radiol, D-81675 Munich, Germany

来源：

BIOINFORMATICS | 2021年 / 36卷 / 21期

关键词：

D O I：

暂无

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. Results Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementation We make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. Supplementary information are available at Bioinformatics online.

引用

页码：5255 / 5261

页数：7

共 50 条

[1] Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports
Bressem, Keno K.
Adams, Lisa C.
Gaudin, Robert A.
Troeltzsch, Daniel
Hamm, Bernd
Makowski, Marcus R.
Schuele, Chan-Yong
Vahldiek, Janis L.
Niehues, Stefan M.
BIOINFORMATICS, 2020, 36 (21) : 5255 - 5261
[2] A survey of text classification based on pre-trained language model
Wu, Yujia
Wan, Jun
NEUROCOMPUTING, 2025, 616
[3] Comparison of Pre-trained Word Vectors for Arabic Text Classification using Deep Learning Approach
Alwehaibi, Ali
Roy, Kaushik
2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1471 - 1474
[4] Better Few-Shot Text Classification with Pre-trained Language Model
Chen, Zheng
Zhang, Yunchen
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 537 - 548
[5] Pre-Trained Language Model-Based Deep Learning for Sentiment Classification of Vietnamese Feedback
Loc, Cu Vinh
Viet, Truong Xuan
Viet, Tran Hoang
Thao, Le Hoang
Viet, Nguyen Hoang
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2023, 22 (03)
[6] GEML: a graph-enhanced pre-trained language model framework for text classification via mutual learning
Yu, Tao
Song, Rui
Pinto, Sandro
Gomes, Tiago
Tavares, Adriano
Xu, Hao
APPLIED INTELLIGENCE, 2024, 54 (23) : 12215 - 12229
[7] Kurdish Sign Language Recognition Using Pre-Trained Deep Learning Models
Alsaud, Ali A.
Yousif, Raghad Z.
Aziz, Marwan. M.
Kareem, Shahab W.
Maho, Amer J.
JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (06) : 1334 - 1344
[8] Automatic TNM staging of colorectal cancer radiology reports using pre-trained language models
Chizhikova, Mariia
Lopez-ubeda, Pilar
Martin-Noguerol, Teodoro
Diaz-Galiano, Manuel C.
Urena-Lopez, L. Alfonso
Luna, Antonio
Martin-Valdivia, M. Teresa
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 259
[9] Tomato crop disease classification using pre-trained deep learning algorithm
Rangarajan, Aravind Krishnaswamy
Purushothaman, Raja
Ramesh, Aniirudh
INTERNATIONAL CONFERENCE ON ROBOTICS AND SMART MANUFACTURING (ROSMA2018), 2018, 133 : 1040 - 1047
[10] Hockey activity recognition using pre-trained deep learning model
Rangasamy, Keerthana
As'ari, Muhammad Amir
Rahmad, Nur Azmina
Ghazali, Nurul Fathiah
ICT EXPRESS, 2020, 6 (03): : 170 - 174

← 1 2 3 4 5 →