Self-Supervised Correlation Learning for Cross-Modal Retrieval

被引:29
|
作者
Liu, Yaxin [1 ]
Wu, Jianlong [1 ]
Qu, Leigang [1 ]
Gan, Tian [1 ]
Yin, Jianhua [1 ]
Nie, Liqiang [1 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; self-supervised contrastive learning; mutual information estimation;
D O I
10.1109/TMM.2022.3152086
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval aims to retrieve relevant data from another modality when given a query of one modality. Although most existing methods that rely on the label information of multimedia data have achieved promising results, the performance benefiting from labeled data comes at a high cost since labeling data often requires enormous labor resources, especially on large-scale multimedia datasets. Therefore, unsupervised cross-modal learning is of crucial importance in real-world applications. In this paper, we propose a novel unsupervised cross-modal retrieval method, named Self-supervised Correlation Learning (SCL), which takes full advantage of large amounts of unlabeled data to learn discriminative and modality-invariant representations. Since unsupervised learning lacks the supervision of category labels, we incorporate the knowledge from the input as a supervisory signal by maximizing the mutual information between the input and the output of different modality-specific projectors. Besides, for the purpose of learning discriminative representations, we exploit unsupervised contrastive learning to model the relationship among intra- and inter-modality instances, which makes similar samples closer and pushes dissimilar samples apart. Moreover, to further eliminate the modality gap, we use a weight-sharing scheme and minimize the modality-invariant loss in the joint representation space. Beyond that, we also extend the proposed method to the semi-supervised setting. Extensive experiments conducted on three widely-used benchmark datasets demonstrate that our method achieves competitive results compared with current state-of-the-art cross-modal retrieval approaches.
引用
收藏
页码:2851 / 2863
页数:13
相关论文
共 50 条
  • [41] Self-Supervised Cross-Modal Online Learning of Basic Object Affordances for Developmental Robotic Systems
    Ridge, Barry
    Skocaj, Danijel
    Leonardis, Ales
    2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, : 5047 - 5054
  • [42] CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking
    Deldari, Shohreh
    Spathis, Dimitris
    Malekzadeh, Mohammad
    Kawsar, Fahim
    Salim, Flora D.
    Mathur, Akhil
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 152 - 160
  • [43] Two-stage deep learning for supervised cross-modal retrieval
    Jie Shao
    Zhicheng Zhao
    Fei Su
    Multimedia Tools and Applications, 2019, 78 : 16615 - 16631
  • [44] Supervised Contrastive Learning for 3D Cross-Modal Retrieval
    Choo, Yeon-Seung
    Kim, Boeun
    Kim, Hyun-Sik
    Park, Yong-Suk
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [45] Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval
    Zhang, Liang
    Ma, Bingpeng
    He, Jianfeng
    Li, Guorong
    Huang, Qingming
    Tian, Qi
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3406 - 3412
  • [46] Two-stage deep learning for supervised cross-modal retrieval
    Shao, Jie
    Zhao, Zhicheng
    Su, Fei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) : 16615 - 16631
  • [47] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
    Yuan, Xu
    Zhong, Hua
    Chen, Zhikui
    Zhong, Fangming
    Hu, Yueming
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
  • [48] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
    Hua, Yan
    Du, Jianhe
    PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
  • [49] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [50] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
    Afham, Mohamed
    Dissanayake, Isuru
    Dissanayake, Dinithi
    Dharmasiri, Amaya
    Thilakarathna, Kanchana
    Rodrigo, Ranga
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902