Cross-Modal Retrieval With Partially Mismatched Pairs

被引:34
|
作者
Hu, Peng [1 ]
Huang, Zhenyu [1 ]
Peng, Dezhong [1 ,2 ,3 ]
Wang, Xu [1 ]
Peng, Xi [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Chengdu Ruibei Yingte Informat Technol Co Ltd, Chengdu 610054, Peoples R China
[3] Sichuan Zhiqian Technol Co Ltd, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金; 国家重点研发计划;
关键词
Complementary contrastive learning; cross-modal retrieval; mismatched pairs;
D O I
10.1109/TPAMI.2023.3247939
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study a challenging but less-touched problem in cross-modal retrieval, i.e., partially mismatched pairs (PMPs). Specifically, in real-world scenarios, a huge number of multimedia data (e.g., the Conceptual Captions dataset) are collected from the Internet, and thus it is inevitable to wrongly treat some irrelevant cross-modal pairs as matched. Undoubtedly, such a PMP problem will remarkably degrade the cross-modal retrieval performance. To tackle this problem, we derive a unified theoretical Robust Cross-modal Learning framework (RCL) with an unbiased estimator of the cross-modal retrieval risk, which aims to endow the cross-modal retrieval methods with robustness against PMPs. In detail, our RCL adopts a novel complementary contrastive learning paradigm to address the following two challenges, i.e., the overfitting and underfitting issues. On the one hand, our method only utilizes the negative information which is much less likely false compared with the positive information, thus avoiding the overfitting issue to PMPs. However, these robust strategies could induce underfitting issues, thus making training models more difficult. On the other hand, to address the underfitting issue brought by weak supervision, we present to leverage of all available negative pairs to enhance the supervision contained in the negative information. Moreover, to further improve the performance, we propose to minimize the upper bounds of the risk to pay more attention to hard samples. To verify the effectiveness and robustness of the proposed method, we carry out comprehensive experiments on five widely-used benchmark datasets compared with nine state-of-the-art approaches w.r.t. the image-text and video-text retrieval tasks. The code is available at https://github.com/penghu-cs/RCL.
引用
收藏
页码:9595 / 9610
页数:16
相关论文
共 50 条
  • [21] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [22] Correspondence Autoencoders for Cross-Modal Retrieval
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    Ahmad, Ibrar
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2015, 12 (01)
  • [23] Cross-modal Retrieval with Label Completion
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shen, Heng Tao
    He, Li
    Song, Jingkuan
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 302 - 306
  • [24] FedCMR: Federated Cross-Modal Retrieval
    Zong, Linlin
    Xie, Qiujie
    Zhou, Jiahui
    Wu, Peiran
    Zhang, Xianchao
    Xu, Bo
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1672 - 1676
  • [25] Cross-Modal Retrieval Using Deep Learning
    Malik, Shaily
    Bhardwaj, Nikhil
    Bhardwaj, Rahul
    Kumar, Saurabh
    PROCEEDINGS OF THIRD DOCTORAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE, DOSCI 2022, 2023, 479 : 725 - 734
  • [26] Multi-Label Cross-modal Retrieval
    Ranjan, Viresh
    Rasiwasia, Nikhil
    Jawahar, C. V.
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4094 - 4102
  • [27] Online weighted hashing for cross-modal retrieval
    Jiang, Zining
    Weng, Zhenyu
    Li, Runhao
    Zhuang, Huiping
    Lin, Zhiping
    PATTERN RECOGNITION, 2025, 161
  • [28] Multi-modal and cross-modal for lecture videos retrieval
    Nhu Van Nguyen
    Coustaty, Mickal
    Ogier, Jean-Marc
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2667 - 2672
  • [29] Cross-modal retrieval based on shared proxies
    Wei, Yuxin
    Zheng, Ligang
    Qiu, Guoping
    Cai, Guocan
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (01)
  • [30] Deep Semantic Mapping for Cross-Modal Retrieval
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 234 - 241