Decoupled contrastive learning for multilingual multimodal medical pre-trained model

被引:0
|
作者
Li, Qiyuan [1 ,2 ,3 ,4 ]
Qiu, Chen [1 ,2 ,3 ,4 ]
Liu, Haijiang [1 ,2 ,3 ,4 ]
Gu, Jinguang [1 ,2 ,3 ,4 ]
Luo, Dan [5 ]
机构
[1] Wuhan Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan 430065, Hubei, Peoples R China
[2] Hubei Prov Key Lab Intelligent Informat Proc & Rea, Wuhan 430065, Hubei, Peoples R China
[3] Inst Sci & Tech Informat China, Key Lab Rich Media Knowledge Org, Beijing 100038, Peoples R China
[4] Inst Sci & Tech Informat China, Serv Digital Publishing Content, Beijing 100038, Peoples R China
[5] Lehigh Univ, Dept Comp Sci & Engn, Bethlehem, PA 18015 USA
关键词
Multilingual multimodal learning; Decoupled contrastive learning; Medical pre-training model;
D O I
10.1016/j.neucom.2025.129809
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilingual multimodal pre-training aims to facilitate the integration of conceptual representations across diverse languages and modalities within a shared, high-dimensional semantic space. This endeavor in healthcare faces challenges related to language diversity, suboptimal multimodal interactions, and an absence of coherent multilingual multimodal representations. In response to these challenges, we introduce a novel multilingual multimodal medical pre-training model. Initially, we employ a strategic augmentation of the medical corpus by expanding the MIMIC-CXR report dataset to 20 distinct languages using machine translation techniques. Subsequently, we develop a targeted label disambiguation technique to address the labeling noise within decoupled contrastive learning. In particular, it categorizes and refines uncertain phrases within the clinical reports based on disease type, promoting finer-grained semantic similarity and improving inter- modality interactions. Building on these proposals, we present a refined multilingual multimodal medical pre-trained model, significantly enhancing the understanding of medical multimodal data and adapting the model to multilingual medical contexts. Experiments reveal that our model outperforms other baselines in medical image classification and multilingual medical image-text retrieval by up to 13.78% and 12.6%, respectively.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Inverse Problems Leveraging Pre-trained Contrastive Representations
    Ravula, Sriram
    Smyrnis, Georgios
    Jordan, Matt
    Dimakis, Alexandros G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [22] mPMR: A Multilingual Pre-trained Machine Reader at Scale
    Xu, Weiwen
    Li, Xin
    Lam, Wai
    Bing, Lidong
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1533 - 1546
  • [23] PTMA: Pre-trained Model Adaptation for Transfer Learning
    Li, Xiao
    Yan, Junkai
    Jiang, Jianjian
    Zheng, Wei-Shi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2024, 2024, 14884 : 176 - 188
  • [24] MULTILINGUAL TEXT CLASSIFIER USING PRE-TRAINED UNIVERSAL SENTENCE ENCODER MODEL
    Orlovskiy, O., V
    Sohrab, Khalili
    Ostapov, S. E.
    Hazdyuk, K. P.
    Shumylyak, L. M.
    RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2022, (03) : 102 - 108
  • [25] A Pre-trained Model for Chinese Medical Record Punctuation Restoration
    Yu, Zhipeng
    Ling, Tongtao
    Gu, Fangqing
    Sheng, Huangxu
    Liu, Yi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 101 - 112
  • [26] Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning
    Liu, Yufan
    Tian, Boxue
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)
  • [27] A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
    Li, Yikuan
    Wang, Hanyin
    Luo, Yuan
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1999 - 2004
  • [28] CheSS: Chest X-Ray Pre-trained Model via Self-supervised Contrastive Learning
    Kyungjin Cho
    Ki Duk Kim
    Yujin Nam
    Jiheon Jeong
    Jeeyoung Kim
    Changyong Choi
    Soyoung Lee
    Jun Soo Lee
    Seoyeon Woo
    Gil-Sun Hong
    Joon Beom Seo
    Namkug Kim
    Journal of Digital Imaging, 2023, 36 : 902 - 910
  • [29] Co-speech Gesture Synthesis by Reinforcement Learning with Contrastive Pre-trained Rewards
    Sun, Mingyang
    Zhao, Mengchen
    Hou, Yaqing
    Li, Minglei
    Xu, Huang
    Xu, Songcen
    Hao, Jianye
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2331 - 2340
  • [30] Fusing Pre-trained Language Models with Multimodal Prompts through Reinforcement Learning
    Yu, Youngjae
    Chung, Jiwan
    Yun, Heeseung
    Hessel, Jack
    Park, Jae Sung
    Lu, Ximing
    Zellers, Rowan
    Ammanabrolu, Prithviraj
    Le Bras, Ronan
    Kim, Gunhee
    Choi, Yejin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10845 - 10856