Cross-Modal Retrieval With Partially Mismatched Pairs

被引:34
|
作者
Hu, Peng [1 ]
Huang, Zhenyu [1 ]
Peng, Dezhong [1 ,2 ,3 ]
Wang, Xu [1 ]
Peng, Xi [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Chengdu Ruibei Yingte Informat Technol Co Ltd, Chengdu 610054, Peoples R China
[3] Sichuan Zhiqian Technol Co Ltd, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金; 国家重点研发计划;
关键词
Complementary contrastive learning; cross-modal retrieval; mismatched pairs;
D O I
10.1109/TPAMI.2023.3247939
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study a challenging but less-touched problem in cross-modal retrieval, i.e., partially mismatched pairs (PMPs). Specifically, in real-world scenarios, a huge number of multimedia data (e.g., the Conceptual Captions dataset) are collected from the Internet, and thus it is inevitable to wrongly treat some irrelevant cross-modal pairs as matched. Undoubtedly, such a PMP problem will remarkably degrade the cross-modal retrieval performance. To tackle this problem, we derive a unified theoretical Robust Cross-modal Learning framework (RCL) with an unbiased estimator of the cross-modal retrieval risk, which aims to endow the cross-modal retrieval methods with robustness against PMPs. In detail, our RCL adopts a novel complementary contrastive learning paradigm to address the following two challenges, i.e., the overfitting and underfitting issues. On the one hand, our method only utilizes the negative information which is much less likely false compared with the positive information, thus avoiding the overfitting issue to PMPs. However, these robust strategies could induce underfitting issues, thus making training models more difficult. On the other hand, to address the underfitting issue brought by weak supervision, we present to leverage of all available negative pairs to enhance the supervision contained in the negative information. Moreover, to further improve the performance, we propose to minimize the upper bounds of the risk to pay more attention to hard samples. To verify the effectiveness and robustness of the proposed method, we carry out comprehensive experiments on five widely-used benchmark datasets compared with nine state-of-the-art approaches w.r.t. the image-text and video-text retrieval tasks. The code is available at https://github.com/penghu-cs/RCL.
引用
收藏
页码:9595 / 9610
页数:16
相关论文
共 50 条
  • [1] Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
    Han, Haochen
    Zheng, Qinghua
    Dai, Guang
    Luo, Minnan
    Wang, Jingdong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26669 - 26678
  • [2] Online hashing with partially known labels for cross-modal retrieval
    Shu, Zhenqiu
    Li, Li
    Yu, Zhengtao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
  • [3] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [4] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [5] A semi-supervised cross-modal memory bank for cross-modal retrieval
    Huang, Yingying
    Hu, Bingliang
    Zhang, Yipeng
    Gao, Chi
    Wang, Quan
    NEUROCOMPUTING, 2024, 579
  • [6] Cross-Modal Center Loss for 3D Cross-Modal Retrieval
    Jing, Longlong
    Vahdani, Elahe
    Tan, Jiaxing
    Tian, Yingli
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3141 - 3150
  • [7] Soft Contrastive Cross-Modal Retrieval
    Song, Jiayu
    Hu, Yuxuan
    Zhu, Lei
    Zhang, Chengyuan
    Zhang, Jian
    Zhang, Shichao
    APPLIED SCIENCES-BASEL, 2024, 14 (05):
  • [8] Probabilistic Embeddings for Cross-Modal Retrieval
    Chun, Sanghyuk
    Oh, Seong Joon
    de Rezende, Rafael Sampaio
    Kalantidis, Yannis
    Larlus, Diane
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8411 - 8420
  • [9] Cross-modal Retrieval with Correspondence Autoencoder
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 7 - 16
  • [10] Cross-modal retrieval with dual optimization
    Xu, Qingzhen
    Liu, Shuang
    Qiao, Han
    Li, Miao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (05) : 7141 - 7157