Cross-Lingual Cross-Modal Retrieval With Noise-Robust Fine-Tuning

被引:0
|
作者
Cai, Rui [1 ,2 ]
Dong, Jianfeng [1 ,3 ]
Liang, Tianxiang [1 ,2 ]
Liang, Yonghui [1 ,2 ]
Wang, Yabing [1 ,2 ]
Yang, Xun [4 ]
Wang, Xun [1 ,2 ]
Wang, Meng [5 ]
机构
[1] Zhejiang Gongshang Univ, Coll Comp Sci & Technol, Hangzhou 310035, Zhejiang, Peoples R China
[2] Zhejiang Key Lab Big Data & Future Ecommerce Techn, Hangzhou 310035, Peoples R China
[3] Univ Sci & Technol China, Hefei 230026, Peoples R China
[4] Univ Sci & Technol China, Sch Informat Sci & Technol, Dept Elect Engn & Informat Sci, Hefei 230026, Peoples R China
[5] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
基金
中国国家自然科学基金;
关键词
Noise; Data models; Videos; Noise robustness; Noise measurement; Task analysis; Visualization; Cross-Lingual transfer; cross-modal retrieval; machine translation; noise-robust fine-tuning; EMBEDDINGS;
D O I
10.1109/TKDE.2024.3400060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual cross-modal retrieval aims at leveraging human-labeled annotations in a source language to construct cross-modal retrieval models for a new target language, due to the lack of manually-annotated dataset in low-resource languages (target languages). Contrary to the growing developments in the field of monolingual cross-modal retrieval, there has been less research focusing on cross-modal retrieval in the cross-lingual scenario. A straightforward method to obtain target-language labeled data is translating source-language datasets utilizing Machine Translations (MT). However, as MT is not perfect, it tends to introduce noise during translation, rendering textual embeddings corrupted and thereby compromising the retrieval performance. To alleviate this, we propose Noise-Robust Fine-tuning (NRF) which tries to extract clean textual information from a possibly noisy target-language input with the guidance of its source-language counterpart. Besides, contrastive learning involving different modalities are performed to strengthen the noise-robustness of our model. Different from traditional cross-modal retrieval methods which only employ image/video-text paired data for fine-tuning, in NRF, selected parallel data plays a key role in improving the noise-filtering ability of our model. Extensive experiments are conducted on three video-text and image-text retrieval benchmarks across different target languages, and the results demonstrate that our method significantly improves the overall performance without using any image/video-text paired data on target languages.
引用
收藏
页码:5860 / 5873
页数:14
相关论文
共 50 条
  • [31] Semantic Cross-Lingual Information Retrieval
    Pourmahmoud, Solmaz
    Shamsfard, Mehrnoush
    23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 80 - +
  • [32] Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance
    Wang, Song
    Tan, Zhen
    Guo, Ruocheng
    Li, Jundong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12528 - 12540
  • [33] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174
  • [34] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [35] Robust Fragment-Based Framework for Cross-lingual Sentence Retrieval
    Trijakwanich, Nattapol
    Limkonchotiwat, Peerat
    Sarwar, Raheem
    Phatthiyaphaibun, Wannaphong
    Chuangsuwanich, Ekapol
    Nutanong, Sarana
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 935 - 944
  • [36] A Label Noise Robust Cross-Modal Hashing Approach
    Wang, Runmin
    Yang, Yuanlin
    Han, Guangyang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 577 - 589
  • [37] Cross-lingual Search for e-Commerce based on Query Translatability and Mixed-Domain Fine-Tuning
    Perez-Martin, Jesus
    Gomez-Robles, Jorge
    Gutierrez-Fandino, Asier
    Adsul, Pankaj
    Rajanala, Sravanthi
    Lezcano, Leonardo
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 892 - 898
  • [38] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Elham Ghanbari
    Azadeh Shakery
    Applied Intelligence, 2022, 52 : 3156 - 3174
  • [39] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [40] Study-level cross-modal retrieval of chest x-ray images and reports with adapter-based fine-tuning
    Chen, Yingjie
    Ou, Weihua
    Gao, Zhifan
    Lai, Lingge
    Wu, Yang
    Chen, Qianqian
    PHYSICS IN MEDICINE AND BIOLOGY, 2025, 70 (04):