CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

被引:204
|
作者
Peng, Yuxin [1 ]
Qi, Jinwei [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Generative adversarial network; cross-modal adversarial mechanism; common representation learning; cross-modal retrieval;
D O I
10.1145/3284750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is known that the inconsistent distributions and representations of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate heterogeneous data and measure their similarities. Recently, generative adversarial networks (GANs) have been proposed and have shown their strong ability to model data distribution and learn discriminative representation. It has also been shown that adversarial learning can be fully exploited to learn discriminative common representations for bridging the heterogeneity gap. Inspired by this, we aim to effectively correlate large-scale heterogeneous data of different modalities with the power of GANs to model cross-modal joint distribution. In this article, we propose Cross-modal Generative Adversarial Networks (CM-GANs) with the following contributions. First, a cross-modal GAN architecture is proposed to model joint distribution over the data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both compete with each other to promote cross-modal correlation learning. Second, the cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model They not only exploit the cross-modal correlation for learning the common representations but also preserve reconstruction information for capturing the semantic consistency within each modality. Third, a cross-modal adversarial training mechanism is proposed, which uses two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representations more discriminative by the adversarial training process. In summary, our proposed CM-GAN approach can use GANs to perform cross-modal common representation learning by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of CM-GANs on cross-modal retrieval compared with 13 state-of-the-art methods on 4 cross-modal datasets.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Adversarial Learning for Cross-Modal Retrieval with Wasserstein Distance
    Cheng, Qingrong
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 16 - 29
  • [32] UNSUPERVISED CROSS-MODAL RETRIEVAL THROUGH ADVERSARIAL LEARNING
    He, Li
    Xu, Xing
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Shen, Heng Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1153 - 1158
  • [33] Deep adversarial metric learning for cross-modal retrieval
    Xu, Xing
    He, Li
    Lu, Huimin
    Gao, Lianli
    Ji, Yanli
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 657 - 672
  • [34] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [35] Cross-modal Adversarial Reprogramming
    Neekhara, Paarth
    Hussain, Shehzeen
    Du, Jinglong
    Dubnov, Shlomo
    Koushanfar, Farinaz
    McAuley, Julian
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2898 - 2906
  • [36] A Cross-Modal Tactile Reproduction Utilizing Tactile and Visual Information Generated by Conditional Generative Adversarial Networks
    Hatori, Koki
    Morikura, Takashi
    Funahashi, Akira
    Takemura, Kenjiro
    IEEE ACCESS, 2025, 13 : 9223 - 9229
  • [37] Generative Adversarial Network Based Asymmetric Deep Cross-Modal Unsupervised Hashing
    Cao, Yuan
    Gao, Yaru
    Chen, Na
    Lin, Jiacheng
    Chen, Sheng
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT I, 2024, 14487 : 30 - 48
  • [38] Modality-specific and shared generative adversarial network for cross-modal retrieval
    Wu, Fei
    Jing, Xiao-Yuan
    Wu, Zhiyong
    Ji, Yimu
    Dong, Xiwei
    Luo, Xiaokai
    Huang, Qinghua
    Wang, Ruchuan
    PATTERN RECOGNITION, 2020, 104
  • [39] GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos
    Chen, Kai
    Wei, Zhipeng
    Chen, Jingjing
    Wu, Zuxuan
    Jiang, Yu-Gang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 698 - 708
  • [40] Multi-Pathway Generative Adversarial Hashing for Unsupervised Cross-Modal Retrieval
    Zhang, Jian
    Peng, Yuxin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (01) : 174 - 187