CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer

被引:0
|
作者
Wang, Yabing [1 ,2 ,3 ,6 ]
Wang, Fan [3 ]
Dong, Jianfeng [1 ,5 ]
Luo, Hao [3 ,4 ]
机构
[1] Zhejiang Gongshang Univ, Hangzhou, Peoples R China
[2] Xi An Jiao Tong Univ, Xian, Peoples R China
[3] Alibaba Grp, DAMO Acad, Hangzhou, Peoples R China
[4] Hupan Lab, Hangzhou, Zhejiang, Peoples R China
[5] Zhejiang Key Lab E Commerce, Hangzhou, Peoples R China
[6] DAMO Acad, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual cross-modal retrieval has garnered increasing attention recently, which aims to achieve the alignment between vision and target language (V-T) without using any annotated V-T data pairs. Current methods employ machine translation (MT) to construct pseudo-parallel data pairs, which are then used to learn a multi-lingual and multi-modal embedding space that aligns visual and target-language representations. However, the large heterogeneous gap between vision and text, along with the noise present in target language translations, poses significant challenges in effectively aligning their representations. To address these challenges, we propose a general framework, Cross-Lingual to CrossModal (CL2CM), which improves the alignment between vision and target language using cross-lingual transfer. This approach allows us to fully leverage the merits of multilingual pre-trained models (e.g., mBERT) and the benefits of the same modality structure, i.e., smaller gap, to provide reliable and comprehensive semantic correspondence (knowledge) for the cross-modal network. We evaluate our proposed approach on two multilingual image-text datasets, Multi30K and MSCOCO, and one video-text dataset, VATEX. The results clearly demonstrate the effectiveness of our proposed method and its high potential for large-scale retrieval.
引用
收藏
页码:5651 / 5659
页数:9
相关论文
共 50 条
  • [1] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
    Fei, Hongliang
    Yu, Tan
    Li, Ping
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3644 - 3650
  • [2] Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning
    Wang, Yabing
    Dong, Jianfeng
    Liang, Tianxiang
    Zhang, Minsong
    Cai, Rui
    Wang, Xun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [3] CROSS2STRA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment
    Wu, Shengqiong
    Fei, Hao
    Ji, Wei
    Chua, Tat-Seng
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2593 - 2608
  • [4] CCIM: Cross-modal Cross-lingual Interactive Image Translation
    Ma, Cong
    Zhang, Yaping
    Tu, Mei
    Zhao, Yang
    Zhou, Yu
    Zong, Chengqing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4959 - 4965
  • [5] Cross-Lingual Phrase Retrieval
    Zheng, Heqi
    Zhang, Xiao
    Chi, Zewen
    Huang, Heyan
    Yan, Tan
    Lan, Tian
    Wei, Wei
    Mao, Xian-Ling
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4193 - 4204
  • [6] Cross-Lingual Cross-Modal Retrieval With Noise-Robust Fine-Tuning
    Cai, Rui
    Dong, Jianfeng
    Liang, Tianxiang
    Liang, Yonghui
    Wang, Yabing
    Yang, Xun
    Wang, Xun
    Wang, Meng
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 5860 - 5873
  • [7] Dual-View Curricular Optimal Transport for Cross-Lingual Cross-Modal Retrieval
    Wang, Yabing
    Wang, Shuhui
    Luo, Hao
    Dong, Jianfeng
    Wang, Fan
    Han, Meng
    Wang, Xun
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1522 - 1533
  • [8] Cross-Lingual Knowledge Transfer for Clinical Phenotyping
    Papaioannou, Jens-Michalis
    Grundmann, Paul
    van Aken, Betty
    Samaras, Athanasios
    Kyparissidis, Ilias
    Giannakoulas, George
    Gers, Felix
    Loeser, Alexander
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 900 - 909
  • [9] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174
  • [10] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Elham Ghanbari
    Azadeh Shakery
    Applied Intelligence, 2022, 52 : 3156 - 3174