Decoupled contrastive learning for multilingual multimodal medical pre-trained model

被引：0

作者：

Li, Qiyuan ^{[1
,2
,3
,4
]}

Qiu, Chen ^{[1
,2
,3
,4
]}

Liu, Haijiang ^{[1
,2
,3
,4
]}

Gu, Jinguang ^{[1
,2
,3
,4
]}

Luo, Dan ^{[5
]}

机构：

[1] Wuhan Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430065, Hubei, Peoples R China

[2] Hubei Prov Key Lab Intelligent Informat Proc & Rea, Wuhan 430065, Hubei, Peoples R China

[3] Inst Sci & Tech Informat China, Key Lab Rich Media Knowledge Org, Beijing 100038, Peoples R China

[4] Inst Sci & Tech Informat China, Serv Digital Publishing Content, Beijing 100038, Peoples R China

[5] Lehigh Univ, Dept Comp Sci & Engn, Bethlehem, PA 18015 USA

来源：

NEUROCOMPUTING | 2025年 / 633卷

关键词：

Multilingual multimodal learning; Decoupled contrastive learning; Medical pre-training model;

D O I：

10.1016/j.neucom.2025.129809

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multilingual multimodal pre-training aims to facilitate the integration of conceptual representations across diverse languages and modalities within a shared, high-dimensional semantic space. This endeavor in healthcare faces challenges related to language diversity, suboptimal multimodal interactions, and an absence of coherent multilingual multimodal representations. In response to these challenges, we introduce a novel multilingual multimodal medical pre-training model. Initially, we employ a strategic augmentation of the medical corpus by expanding the MIMIC-CXR report dataset to 20 distinct languages using machine translation techniques. Subsequently, we develop a targeted label disambiguation technique to address the labeling noise within decoupled contrastive learning. In particular, it categorizes and refines uncertain phrases within the clinical reports based on disease type, promoting finer-grained semantic similarity and improving inter- modality interactions. Building on these proposals, we present a refined multilingual multimodal medical pre-trained model, significantly enhancing the understanding of medical multimodal data and adapting the model to multilingual medical contexts. Experiments reveal that our model outperforms other baselines in medical image classification and multilingual medical image-text retrieval by up to 13.78% and 12.6%, respectively.

引用

页数：17

共 50 条

[21] Inverse Problems Leveraging Pre-trained Contrastive Representations
Ravula, Sriram
Smyrnis, Georgios
Jordan, Matt
Dimakis, Alexandros G.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[22] mPMR: A Multilingual Pre-trained Machine Reader at Scale
Xu, Weiwen
Li, Xin
Lam, Wai
Bing, Lidong
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1533 - 1546
[23] PTMA: Pre-trained Model Adaptation for Transfer Learning
Li, Xiao
Yan, Junkai
Jiang, Jianjian
Zheng, Wei-Shi
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2024, 2024, 14884 : 176 - 188
[24] MULTILINGUAL TEXT CLASSIFIER USING PRE-TRAINED UNIVERSAL SENTENCE ENCODER MODEL
Orlovskiy, O., V
Sohrab, Khalili
Ostapov, S. E.
Hazdyuk, K. P.
Shumylyak, L. M.
RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2022, (03) : 102 - 108
[25] A Pre-trained Model for Chinese Medical Record Punctuation Restoration
Yu, Zhipeng
Ling, Tongtao
Gu, Fangqing
Sheng, Huangxu
Liu, Yi
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 101 - 112
[26] Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning
Liu, Yufan
Tian, Boxue
BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)
[27] A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Li, Yikuan
Wang, Hanyin
Luo, Yuan
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1999 - 2004
[28] CheSS: Chest X-Ray Pre-trained Model via Self-supervised Contrastive Learning
Kyungjin Cho
Ki Duk Kim
Yujin Nam
Jiheon Jeong
Jeeyoung Kim
Changyong Choi
Soyoung Lee
Jun Soo Lee
Seoyeon Woo
Gil-Sun Hong
Joon Beom Seo
Namkug Kim
Journal of Digital Imaging, 2023, 36 : 902 - 910
[29] Co-speech Gesture Synthesis by Reinforcement Learning with Contrastive Pre-trained Rewards
Sun, Mingyang
Zhao, Mengchen
Hou, Yaqing
Li, Minglei
Xu, Huang
Xu, Songcen
Hao, Jianye
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2331 - 2340
[30] Fusing Pre-trained Language Models with Multimodal Prompts through Reinforcement Learning
Yu, Youngjae
Chung, Jiwan
Yun, Heeseung
Hessel, Jack
Park, Jae Sung
Lu, Ximing
Zellers, Rowan
Ammanabrolu, Prithviraj
Le Bras, Ronan
Kim, Gunhee
Choi, Yejin
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10845 - 10856

← 1 2 3 4 5 →