CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

被引:204
|
作者
Peng, Yuxin [1 ]
Qi, Jinwei [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Generative adversarial network; cross-modal adversarial mechanism; common representation learning; cross-modal retrieval;
D O I
10.1145/3284750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is known that the inconsistent distributions and representations of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate heterogeneous data and measure their similarities. Recently, generative adversarial networks (GANs) have been proposed and have shown their strong ability to model data distribution and learn discriminative representation. It has also been shown that adversarial learning can be fully exploited to learn discriminative common representations for bridging the heterogeneity gap. Inspired by this, we aim to effectively correlate large-scale heterogeneous data of different modalities with the power of GANs to model cross-modal joint distribution. In this article, we propose Cross-modal Generative Adversarial Networks (CM-GANs) with the following contributions. First, a cross-modal GAN architecture is proposed to model joint distribution over the data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both compete with each other to promote cross-modal correlation learning. Second, the cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model They not only exploit the cross-modal correlation for learning the common representations but also preserve reconstruction information for capturing the semantic consistency within each modality. Third, a cross-modal adversarial training mechanism is proposed, which uses two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representations more discriminative by the adversarial training process. In summary, our proposed CM-GAN approach can use GANs to perform cross-modal common representation learning by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of CM-GANs on cross-modal retrieval compared with 13 state-of-the-art methods on 4 cross-modal datasets.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Learning cross-modal visual-tactile representation using ensembled generative adversarial networks
    Li, Xinwu
    Liu, Huaping
    Zhou, Junfeng
    Sun, FuChun
    COGNITIVE COMPUTATION AND SYSTEMS, 2019, 1 (02) : 40 - 44
  • [2] Representation separation adversarial networks for cross-modal retrieval
    Deng, Jiaxin
    Ou, Weihua
    Gou, Jianping
    Song, Heping
    Wang, Anzhi
    Xu, Xing
    WIRELESS NETWORKS, 2024, 30 (05) : 3469 - 3481
  • [3] SYNCGAN: SYNCHRONIZE THE LATENT SPACES OF CROSS-MODAL GENERATIVE ADVERSARIAL NETWORKS
    Chen, Wen-Cheng
    Chen, Chien-Wen
    Hu, Min-Chun
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [4] Unsupervised Generative Adversarial Cross-Modal Hashing
    Zhang, Jian
    Peng, Yuxin
    Yuan, Mingkuan
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 539 - 546
  • [5] Cross-Modal Search for Social Networks via Adversarial Learning
    Zhou, Nan
    Du, Junping
    Xue, Zhe
    Liu, Chong
    Li, Jinxuan
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020
  • [6] Semi-supervised cross-modal image generation with generative adversarial networks
    Li, Dan
    Du, Changde
    He, Huiguang
    PATTERN RECOGNITION, 2020, 100
  • [7] Cross-Modal Learning with Adversarial Samples
    Li, Chao
    Deng, Cheng
    Gao, Shangqian
    Xie, De
    Liu, Wei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] Common Semantic Representation Method Based on Object Attention and Adversarial Learning for Cross-Modal Data in IoV
    Kou, Feifei
    Du, Junping
    Cui, Wanqiu
    Shi, Lei
    Cheng, Pengchao
    Chen, Jiannan
    Li, Jinxuan
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (12) : 11588 - 11598
  • [9] Cross-modal Common Representation Learning by Hybrid Transfer Network
    Huang, Xin
    Peng, Yuxin
    Yuan, Mingkuan
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1893 - 1900
  • [10] Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis
    Tan, Hongchen
    Liu, Xiuping
    Yin, Baocai
    Li, Xin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 832 - 845