Self-Supervised Correlation Learning for Cross-Modal Retrieval

被引:29
|
作者
Liu, Yaxin [1 ]
Wu, Jianlong [1 ]
Qu, Leigang [1 ]
Gan, Tian [1 ]
Yin, Jianhua [1 ]
Nie, Liqiang [1 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; self-supervised contrastive learning; mutual information estimation;
D O I
10.1109/TMM.2022.3152086
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval aims to retrieve relevant data from another modality when given a query of one modality. Although most existing methods that rely on the label information of multimedia data have achieved promising results, the performance benefiting from labeled data comes at a high cost since labeling data often requires enormous labor resources, especially on large-scale multimedia datasets. Therefore, unsupervised cross-modal learning is of crucial importance in real-world applications. In this paper, we propose a novel unsupervised cross-modal retrieval method, named Self-supervised Correlation Learning (SCL), which takes full advantage of large amounts of unlabeled data to learn discriminative and modality-invariant representations. Since unsupervised learning lacks the supervision of category labels, we incorporate the knowledge from the input as a supervisory signal by maximizing the mutual information between the input and the output of different modality-specific projectors. Besides, for the purpose of learning discriminative representations, we exploit unsupervised contrastive learning to model the relationship among intra- and inter-modality instances, which makes similar samples closer and pushes dissimilar samples apart. Moreover, to further eliminate the modality gap, we use a weight-sharing scheme and minimize the modality-invariant loss in the joint representation space. Beyond that, we also extend the proposed method to the semi-supervised setting. Extensive experiments conducted on three widely-used benchmark datasets demonstrate that our method achieves competitive results compared with current state-of-the-art cross-modal retrieval approaches.
引用
收藏
页码:2851 / 2863
页数:13
相关论文
共 50 条
  • [31] Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
    Sarkar, Pritam
    Etemad, Ali
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9723 - 9732
  • [32] SCPNet: Unsupervised Cross-Modal Homography Estimation via Intra-modal Self-supervised Learning
    Zhang, Runmin
    Ma, Jun
    Cao, Si-Yuan
    Luo, Lun
    Yu, Beinan
    Chen, Shu-Jie
    Li, Junwei
    Shen, Hui-Liang
    COMPUTER VISION - ECCV 2024, PT XXIII, 2025, 15081 : 460 - 477
  • [33] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Fuhao Zou
    Xingqiang Bai
    Chaoyang Luan
    Kai Li
    Yunfei Wang
    Hefei Ling
    World Wide Web, 2019, 22 : 825 - 841
  • [34] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Zou, Fuhao
    Bai, Xingqiang
    Luan, Chaoyang
    Li, Kai
    Wang, Yunfei
    Ling, Hefei
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 825 - 841
  • [35] Cross-modal Self-Supervised Learning for Lip Reading: When Contrastive Learning meets Adversarial Training
    Sheng, Changchong
    Pietikainen, Matti
    Tian, Qi
    Liu, Li
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2456 - 2464
  • [36] Through-Wall Human Pose Reconstruction Based on Cross-Modal Learning and Self-Supervised Learning
    Zheng, Zhijie
    Zhang, Diankun
    Liang, Xiao
    Liu, Xiaojun
    Fang, Guangyou
    IEEE Geoscience and Remote Sensing Letters, 2022, 19
  • [37] Through-Wall Human Pose Reconstruction Based on Cross-Modal Learning and Self-Supervised Learning
    Zheng, Zhijie
    Zhang, Diankun
    Liang, Xiao
    Liu, Xiaojun
    Fang, Guangyou
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [38] A semi-supervised cross-modal memory bank for cross-modal retrieval
    Huang, Yingying
    Hu, Bingliang
    Zhang, Yipeng
    Gao, Chi
    Wang, Quan
    NEUROCOMPUTING, 2024, 579
  • [39] ICSF: Integrating Inter-Modal and Cross-Modal Learning Framework for Self-Supervised Heterogeneous Change Detection
    Zhang, Erlei
    Zong, He
    Li, Xinyu
    Feng, Mingchen
    Ren, Jinchang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [40] Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
    Wang, Xin
    Huang, Qiuyuan
    Celikyilmaz, Asli
    Gao, Jianfeng
    Shen, Dinghan
    Wang, Yuan-Fang
    Wang, William Yang
    Zhang, Lei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3622 - 6631