DuCo-Net: Dual-Contrastive Learning Network for Medical Report Retrieval Leveraging Enhanced Encoders and Augmentations

被引：0

作者：

Rahman, Zahid Ur ^{[1
]}

Lee, Ju-Hwan ^{[1
]}

Vu, Dang Thanh ^{[2
]}

Murtza, Iqbal ^{[1
,3
]}

Kim, Jin-Young ^{[1
]}

机构：

[1] Chonnam Natl Univ, Dept Intelligent Elect & Comp Engn, Gwangju 61186, South Korea

[2] AISeed Inc, Res Ctr, Gwangju 61186, South Korea

[3] Air Univ Islamabad, Fac Comp & AI, Dept Creat Technol, Islamabad 44000, Pakistan

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Biomedical imaging; Contrastive learning; Data models; Medical diagnostic imaging; Computational modeling; Transformers; Computer architecture; Training; Long short term memory; Computer vision; Medical report retrieval; contrastive learning; multi-modal learning; deep learning; chest x-rays; medical image augmentation; radiology; REPORT GENERATION;

D O I：

10.1109/ACCESS.2025.3538325

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The conventional process of generating medical radiology reports is labor-intensive and time-consuming, requiring radiologists to describe findings meticulously from imaging studies. This manual approach often causes undesirable delays in patient care. Despite advancements in computer vision and deep learning, developing an effective computer-aided solution to generate automated medical reports remains challenging. The recent advancements in deep learning technology, especially with the advent of contrastive learning, have shown significant performance in natural language supervision. However, their application to medical report generation, particularly in the domain of chest x-rays (CXR), has been limited due to the lack of large annotated datasets. Many studies have proposed multimodal contrastive learning schemes to address the data scarcity problem for natural images. However, none of these techniques have been efficiently explored in terms of medical report generation. This study addresses these challenges by proposing a dual contrastive learning network (DuCo-Net) containing backbone and augmented networks. The backbone network is trained on the original data, while the augmented network emphasizes cross-model augmentation learning in a unified framework. DuCo-Net enables two complementary learning mechanisms: intra-modal learning, where each network learns specialized features within its modality (either image or text), and inter-modal learning, which captures relationships between image and text modalities through a combined loss function. This dual learning approach leverages modified DenseNet121 and BioBERT models with advanced pooling techniques specifically tailored for handling medical data. Comprehensive evaluations on two publicly available datasets demonstrate that DuCo-Net significantly outperforms current benchmarks. On the Indiana University Chest X-rays dataset, our proposed methodology demonstrates significant improvements across standard metrics (BLEU-1: 0.50, ROUGE: 0.40, METEOR: 0.24, F1: 0.40). For the MIMIC-CXR dataset, the framework maintains robust performance (BLEU-1: 0.42, ROUGE: 0.34, METEOR: 0.20, F1: 0.34), representing substantial improvements over existing state-of-the-art approaches in medical report generation.

引用

页码：27462 / 27476

页数：15