Cross-Lingual Cross-Modal Retrieval With Noise-Robust Fine-Tuning

被引:0
|
作者
Cai, Rui [1 ,2 ]
Dong, Jianfeng [1 ,3 ]
Liang, Tianxiang [1 ,2 ]
Liang, Yonghui [1 ,2 ]
Wang, Yabing [1 ,2 ]
Yang, Xun [4 ]
Wang, Xun [1 ,2 ]
Wang, Meng [5 ]
机构
[1] Zhejiang Gongshang Univ, Coll Comp Sci & Technol, Hangzhou 310035, Zhejiang, Peoples R China
[2] Zhejiang Key Lab Big Data & Future Ecommerce Techn, Hangzhou 310035, Peoples R China
[3] Univ Sci & Technol China, Hefei 230026, Peoples R China
[4] Univ Sci & Technol China, Sch Informat Sci & Technol, Dept Elect Engn & Informat Sci, Hefei 230026, Peoples R China
[5] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
基金
中国国家自然科学基金;
关键词
Noise; Data models; Videos; Noise robustness; Noise measurement; Task analysis; Visualization; Cross-Lingual transfer; cross-modal retrieval; machine translation; noise-robust fine-tuning; EMBEDDINGS;
D O I
10.1109/TKDE.2024.3400060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual cross-modal retrieval aims at leveraging human-labeled annotations in a source language to construct cross-modal retrieval models for a new target language, due to the lack of manually-annotated dataset in low-resource languages (target languages). Contrary to the growing developments in the field of monolingual cross-modal retrieval, there has been less research focusing on cross-modal retrieval in the cross-lingual scenario. A straightforward method to obtain target-language labeled data is translating source-language datasets utilizing Machine Translations (MT). However, as MT is not perfect, it tends to introduce noise during translation, rendering textual embeddings corrupted and thereby compromising the retrieval performance. To alleviate this, we propose Noise-Robust Fine-tuning (NRF) which tries to extract clean textual information from a possibly noisy target-language input with the guidance of its source-language counterpart. Besides, contrastive learning involving different modalities are performed to strengthen the noise-robustness of our model. Different from traditional cross-modal retrieval methods which only employ image/video-text paired data for fine-tuning, in NRF, selected parallel data plays a key role in improving the noise-filtering ability of our model. Extensive experiments are conducted on three video-text and image-text retrieval benchmarks across different target languages, and the results demonstrate that our method significantly improves the overall performance without using any image/video-text paired data on target languages.
引用
收藏
页码:5860 / 5873
页数:14
相关论文
共 50 条
  • [1] Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning
    Wang, Yabing
    Dong, Jianfeng
    Liang, Tianxiang
    Zhang, Minsong
    Cai, Rui
    Wang, Xun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [2] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
    Fei, Hongliang
    Yu, Tan
    Li, Ping
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3644 - 3650
  • [3] Consistency Regularization for Cross-Lingual Fine-Tuning
    Zheng, Bo
    Dong, Li
    Huang, Shaohan
    Wang, Wenhui
    Chi, Zewen
    Singhal, Saksham
    Che, Wanxiang
    Liu, Ting
    Song, Xia
    Wei, Furu
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3403 - 3417
  • [4] Noise-robust Deep Cross-Modal Hashing
    Wang, Runmin
    Yu, Guoxian
    Zhang, Hong
    Guo, Maozu
    Cui, Lizhen
    Zhang, Xiangliang
    INFORMATION SCIENCES, 2021, 581 : 136 - 154
  • [5] Composable Sparse Fine-Tuning for Cross-Lingual Transfer
    Ansell, Alan
    Ponti, Edoardo Maria
    Korhonen, Anna
    Vulic, Ivan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1778 - 1796
  • [6] Effective Fine-tuning Methods for Cross-lingual Adaptation
    Yu, Tao
    Joty, Shafiq
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8492 - 8501
  • [7] CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer
    Wang, Yabing
    Wang, Fan
    Dong, Jianfeng
    Luo, Hao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5651 - 5659
  • [8] Dual-View Curricular Optimal Transport for Cross-Lingual Cross-Modal Retrieval
    Wang, Yabing
    Wang, Shuhui
    Luo, Hao
    Dong, Jianfeng
    Wang, Fan
    Han, Meng
    Wang, Xun
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1522 - 1533
  • [9] RGBT Tracking via Noise-Robust Cross-Modal Ranking
    Li, Chenglong
    Xiang, Zhiqiang
    Tang, Jin
    Luo, Bin
    Wang, Futian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 5019 - 5031
  • [10] Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking
    Moghe, Nikita
    Steedman, Mark
    Birch, Alexandra
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1137 - 1150