Self-Supervised Correlation Learning for Cross-Modal Retrieval

被引：29

作者：

Liu, Yaxin ^{[1
]}

Wu, Jianlong ^{[1
]}

Qu, Leigang ^{[1
]}

Gan, Tian ^{[1
]}

Yin, Jianhua ^{[1
]}

Nie, Liqiang ^{[1
]}

机构：

[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal retrieval; self-supervised contrastive learning; mutual information estimation;

D O I：

10.1109/TMM.2022.3152086

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-modal retrieval aims to retrieve relevant data from another modality when given a query of one modality. Although most existing methods that rely on the label information of multimedia data have achieved promising results, the performance benefiting from labeled data comes at a high cost since labeling data often requires enormous labor resources, especially on large-scale multimedia datasets. Therefore, unsupervised cross-modal learning is of crucial importance in real-world applications. In this paper, we propose a novel unsupervised cross-modal retrieval method, named Self-supervised Correlation Learning (SCL), which takes full advantage of large amounts of unlabeled data to learn discriminative and modality-invariant representations. Since unsupervised learning lacks the supervision of category labels, we incorporate the knowledge from the input as a supervisory signal by maximizing the mutual information between the input and the output of different modality-specific projectors. Besides, for the purpose of learning discriminative representations, we exploit unsupervised contrastive learning to model the relationship among intra- and inter-modality instances, which makes similar samples closer and pushes dissimilar samples apart. Moreover, to further eliminate the modality gap, we use a weight-sharing scheme and minimize the modality-invariant loss in the joint representation space. Beyond that, we also extend the proposed method to the semi-supervised setting. Extensive experiments conducted on three widely-used benchmark datasets demonstrate that our method achieves competitive results compared with current state-of-the-art cross-modal retrieval approaches.

引用

页码：2851 / 2863

页数：13

共 50 条

[41] Self-Supervised Cross-Modal Online Learning of Basic Object Affordances for Developmental Robotic Systems
Ridge, Barry
Skocaj, Danijel
Leonardis, Ales
2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, : 5047 - 5054
[42] CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking
Deldari, Shohreh
Spathis, Dimitris
Malekzadeh, Mohammad
Kawsar, Fahim
Salim, Flora D.
Mathur, Akhil
PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 152 - 160
[43] Two-stage deep learning for supervised cross-modal retrieval
Jie Shao
Zhicheng Zhao
Fei Su
Multimedia Tools and Applications, 2019, 78 : 16615 - 16631
[44] Supervised Contrastive Learning for 3D Cross-Modal Retrieval
Choo, Yeon-Seung
Kim, Boeun
Kim, Hyun-Sik
Park, Yong-Suk
APPLIED SCIENCES-BASEL, 2024, 14 (22):
[45] Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval
Zhang, Liang
Ma, Bingpeng
He, Jianfeng
Li, Guorong
Huang, Qingming
Tian, Qi
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3406 - 3412
[46] Two-stage deep learning for supervised cross-modal retrieval
Shao, Jie
Zhao, Zhicheng
Su, Fei
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) : 16615 - 16631
[47] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
Yuan, Xu
Zhong, Hua
Chen, Zhikui
Zhong, Fangming
Hu, Yueming
INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
[48] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
Hua, Yan
Du, Jianhe
PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
[49] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
Zhang, Chengyuan
Song, Jiayu
Zhu, Xiaofeng
Zhu, Lei
Zhang, Shichao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
[50] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Afham, Mohamed
Dissanayake, Isuru
Dissanayake, Dinithi
Dharmasiri, Amaya
Thilakarathna, Kanchana
Rodrigo, Ranga
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902

← 1 2 3 4 5 →