Cross-Lingual Cross-Modal Retrieval With Noise-Robust Fine-Tuning

被引:0
|
作者
Cai, Rui [1 ,2 ]
Dong, Jianfeng [1 ,3 ]
Liang, Tianxiang [1 ,2 ]
Liang, Yonghui [1 ,2 ]
Wang, Yabing [1 ,2 ]
Yang, Xun [4 ]
Wang, Xun [1 ,2 ]
Wang, Meng [5 ]
机构
[1] Zhejiang Gongshang Univ, Coll Comp Sci & Technol, Hangzhou 310035, Zhejiang, Peoples R China
[2] Zhejiang Key Lab Big Data & Future Ecommerce Techn, Hangzhou 310035, Peoples R China
[3] Univ Sci & Technol China, Hefei 230026, Peoples R China
[4] Univ Sci & Technol China, Sch Informat Sci & Technol, Dept Elect Engn & Informat Sci, Hefei 230026, Peoples R China
[5] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
基金
中国国家自然科学基金;
关键词
Noise; Data models; Videos; Noise robustness; Noise measurement; Task analysis; Visualization; Cross-Lingual transfer; cross-modal retrieval; machine translation; noise-robust fine-tuning; EMBEDDINGS;
D O I
10.1109/TKDE.2024.3400060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual cross-modal retrieval aims at leveraging human-labeled annotations in a source language to construct cross-modal retrieval models for a new target language, due to the lack of manually-annotated dataset in low-resource languages (target languages). Contrary to the growing developments in the field of monolingual cross-modal retrieval, there has been less research focusing on cross-modal retrieval in the cross-lingual scenario. A straightforward method to obtain target-language labeled data is translating source-language datasets utilizing Machine Translations (MT). However, as MT is not perfect, it tends to introduce noise during translation, rendering textual embeddings corrupted and thereby compromising the retrieval performance. To alleviate this, we propose Noise-Robust Fine-tuning (NRF) which tries to extract clean textual information from a possibly noisy target-language input with the guidance of its source-language counterpart. Besides, contrastive learning involving different modalities are performed to strengthen the noise-robustness of our model. Different from traditional cross-modal retrieval methods which only employ image/video-text paired data for fine-tuning, in NRF, selected parallel data plays a key role in improving the noise-filtering ability of our model. Extensive experiments are conducted on three video-text and image-text retrieval benchmarks across different target languages, and the results demonstrate that our method significantly improves the overall performance without using any image/video-text paired data on target languages.
引用
收藏
页码:5860 / 5873
页数:14
相关论文
共 50 条
  • [21] A two-stage fine-tuning method for low-resource cross-lingual summarization
    Zhang, Kaixiong
    Zhang, Yongbing
    Yu, Zhengtao
    Huang, Yuxin
    Tan, Kaiwen
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (01) : 1125 - 1143
  • [22] A Cross-Modal and Cross-lingual Study of Iconicity in Language: Insights From Deep Learning
    de Varda, Andrea Gregor
    Strapparava, Carlo
    COGNITIVE SCIENCE, 2022, 46 (06)
  • [23] RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
    Zhou, Chulun
    Liang, Yunlong
    Meng, Fandong
    Xu, Jinan
    Su, Jinsong
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11747 - 11762
  • [24] Cross-Lingual Text Image Recognition via Multi-Hierarchy Cross-Modal Mimic
    Chen, Zhuo
    Yin, Fei
    Yang, Qing
    Liu, Cheng-Lin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4830 - 4841
  • [25] Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
    Song, Yuqing
    Chen, Shizhe
    Jin, Qin
    Luo, Wei
    Xie, Jun
    Huang, Fei
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2843 - 2852
  • [26] Grounded and Well-rounded: A Methodological Approach to the Study of Cross-modal and Cross-lingual Grounding
    Mickus, Timothee
    Zosa, Elaine
    Paperno, Denis
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11031 - 11042
  • [27] Discrete Robust Supervised Hashing for Cross-Modal Retrieval
    Yao, Tao
    Zhang, Zhiwang
    Yan, Lianshan
    Yue, Jun
    Tian, Qi
    IEEE ACCESS, 2019, 7 : 39806 - 39814
  • [28] Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval
    Cheng, Miaomiao
    Jing, Liping
    Ng, Michael K.
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (03)
  • [29] Nonlinear Robust Discrete Hashing for Cross-Modal Retrieval
    Yang, Zhan
    Long, Jun
    Zhu, Lei
    Huang, Wenti
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1349 - 1358
  • [30] Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
    Zeng, Yan
    Zhou, Wangchunshu
    Luo, Ao
    Cheng, Ziming
    Zhang, Xinsong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 5731 - 5746