CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

被引:204
|
作者
Peng, Yuxin [1 ]
Qi, Jinwei [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Generative adversarial network; cross-modal adversarial mechanism; common representation learning; cross-modal retrieval;
D O I
10.1145/3284750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is known that the inconsistent distributions and representations of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate heterogeneous data and measure their similarities. Recently, generative adversarial networks (GANs) have been proposed and have shown their strong ability to model data distribution and learn discriminative representation. It has also been shown that adversarial learning can be fully exploited to learn discriminative common representations for bridging the heterogeneity gap. Inspired by this, we aim to effectively correlate large-scale heterogeneous data of different modalities with the power of GANs to model cross-modal joint distribution. In this article, we propose Cross-modal Generative Adversarial Networks (CM-GANs) with the following contributions. First, a cross-modal GAN architecture is proposed to model joint distribution over the data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both compete with each other to promote cross-modal correlation learning. Second, the cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model They not only exploit the cross-modal correlation for learning the common representations but also preserve reconstruction information for capturing the semantic consistency within each modality. Third, a cross-modal adversarial training mechanism is proposed, which uses two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representations more discriminative by the adversarial training process. In summary, our proposed CM-GAN approach can use GANs to perform cross-modal common representation learning by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of CM-GANs on cross-modal retrieval compared with 13 state-of-the-art methods on 4 cross-modal datasets.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] A Cross-Modal Generative Adversarial Network for Scenarios Generation of Renewable Energy
    Kang, Mingyu
    Zhu, Ran
    Chen, Duxin
    Li, Chaojie
    Gu, Wei
    Qian, Xusheng
    Yu, Wenwu
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2024, 39 (02) : 2630 - 2640
  • [22] Quaternion Representation Learning for cross-modal matching
    Wang, Zheng
    Xu, Xing
    Wei, Jiwei
    Xie, Ning
    Shao, Jie
    Yang, Yang
    KNOWLEDGE-BASED SYSTEMS, 2023, 270
  • [23] Hybrid representation learning for cross-modal retrieval
    Cao, Wenming
    Lin, Qiubin
    He, Zhihai
    He, Zhiquan
    NEUROCOMPUTING, 2019, 345 : 45 - 57
  • [24] Category Alignment Adversarial Learning for Cross-Modal Retrieval
    He, Shiyuan
    Wang, Weiyang
    Wang, Zheng
    Xu, Xing
    Yang, Yang
    Wang, Xiaoming
    Shen, Heng Tao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4527 - 4538
  • [25] Adversarial cross-modal retrieval based on dictionary learning
    Shang, Fei
    Zhang, Huaxiang
    Zhu, Lei
    Sun, Jiande
    NEUROCOMPUTING, 2019, 355 : 93 - 104
  • [26] Independency Adversarial Learning for Cross-Modal Sound Separation
    Lin, Zhenkai
    Ji, Yanli
    Yang, Yang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3522 - 3530
  • [27] Adaptive Adversarial Learning based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Wang, Zhongrui
    Gu, Guanghun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [28] Dual Subspaces with Adversarial Learning for Cross-Modal Retrieval
    Xia, Yaxian
    Wang, Wenmin
    Han, Liang
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 654 - 663
  • [29] Deep adversarial metric learning for cross-modal retrieval
    Xing Xu
    Li He
    Huimin Lu
    Lianli Gao
    Yanli Ji
    World Wide Web, 2019, 22 : 657 - 672
  • [30] Cross-modal dual subspace learning with adversarial network
    Shang, Fei
    Zhang, Huaxiang
    Sun, Jiande
    Nie, Liqiang
    Liu, Li
    NEURAL NETWORKS, 2020, 126 : 132 - 142