Cross-Modal Retrieval With Partially Mismatched Pairs

被引：34

作者：

Hu, Peng ^{[1
]}

Huang, Zhenyu ^{[1
]}

Peng, Dezhong ^{[1
,2
,3
]}

Wang, Xu ^{[1
]}

Peng, Xi ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Chengdu Ruibei Yingte Informat Technol Co Ltd, Chengdu 610054, Peoples R China

[3] Sichuan Zhiqian Technol Co Ltd, Chengdu 610065, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 08期

基金：

中国国家自然科学基金; 中国博士后科学基金; 国家重点研发计划;

关键词：

Complementary contrastive learning; cross-modal retrieval; mismatched pairs;

D O I：

10.1109/TPAMI.2023.3247939

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study a challenging but less-touched problem in cross-modal retrieval, i.e., partially mismatched pairs (PMPs). Specifically, in real-world scenarios, a huge number of multimedia data (e.g., the Conceptual Captions dataset) are collected from the Internet, and thus it is inevitable to wrongly treat some irrelevant cross-modal pairs as matched. Undoubtedly, such a PMP problem will remarkably degrade the cross-modal retrieval performance. To tackle this problem, we derive a unified theoretical Robust Cross-modal Learning framework (RCL) with an unbiased estimator of the cross-modal retrieval risk, which aims to endow the cross-modal retrieval methods with robustness against PMPs. In detail, our RCL adopts a novel complementary contrastive learning paradigm to address the following two challenges, i.e., the overfitting and underfitting issues. On the one hand, our method only utilizes the negative information which is much less likely false compared with the positive information, thus avoiding the overfitting issue to PMPs. However, these robust strategies could induce underfitting issues, thus making training models more difficult. On the other hand, to address the underfitting issue brought by weak supervision, we present to leverage of all available negative pairs to enhance the supervision contained in the negative information. Moreover, to further improve the performance, we propose to minimize the upper bounds of the risk to pay more attention to hard samples. To verify the effectiveness and robustness of the proposed method, we carry out comprehensive experiments on five widely-used benchmark datasets compared with nine state-of-the-art approaches w.r.t. the image-text and video-text retrieval tasks. The code is available at https://github.com/penghu-cs/RCL.

引用

页码：9595 / 9610

页数：16

共 50 条

[31] Cross-Modal Topic Correlations for Multimedia Retrieval
Yu, Jing
Cong, Yonghui
Qin, Zengchang
Wan, Tao
2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 246 - 249
[32] Deep Relation Embedding for Cross-Modal Retrieval
Zhang, Yifan
Zhou, Wengang
Wang, Min
Tian, Qi
Li, Houqiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
[33] Learning Cross-Modal Retrieval with Noisy Labels
Hu, Peng
Peng, Xi
Zhu, Hongyuan
Zhen, Liangli
Lin, Jie
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5399 - 5409
[34] The State of the Art for Cross-Modal Retrieval: A Survey
Zhou, Kun
Hassan, Fadratul Hafinaz
Hoon, Gan Keng
IEEE ACCESS, 2023, 11 : 138568 - 138589
[35] Special issue on cross-modal retrieval and analysis
Wu, Jianlong
Hong, Richang
Tian, Qi
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 523 - 524
[36] Cross-Modal Retrieval with Correlation Feature Propagation
Zhang L.
Cao F.
Liang X.
Qian Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (09): : 1993 - 2002
[37] Token Embeddings Alignment for Cross-Modal Retrieval
Xie, Chen-Wei
Wu, Jianmin
Zheng, Yun
Pan, Pan
Hua, Xian-Sheng
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4555 - 4563
[38] Semantic consistency hashing for cross-modal retrieval
Yao, Tao
Kong, Xiangwei
Fu, Haiyan
Tian, Qi
NEUROCOMPUTING, 2016, 193 : 250 - 259
[39] Multi-modal semantic autoencoder for cross-modal retrieval
Wu, Yiling
Wang, Shuhui
Huang, Qingming
NEUROCOMPUTING, 2019, 331 : 165 - 175
[40] GrowBit: Incremental Hashing for Cross-Modal Retrieval
Mandal, Devraj
Annadani, Yashas
Biswas, Soma
COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 305 - 321

← 1 2 3 4 5 →