CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

被引:204
|
作者
Peng, Yuxin [1 ]
Qi, Jinwei [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Generative adversarial network; cross-modal adversarial mechanism; common representation learning; cross-modal retrieval;
D O I
10.1145/3284750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is known that the inconsistent distributions and representations of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate heterogeneous data and measure their similarities. Recently, generative adversarial networks (GANs) have been proposed and have shown their strong ability to model data distribution and learn discriminative representation. It has also been shown that adversarial learning can be fully exploited to learn discriminative common representations for bridging the heterogeneity gap. Inspired by this, we aim to effectively correlate large-scale heterogeneous data of different modalities with the power of GANs to model cross-modal joint distribution. In this article, we propose Cross-modal Generative Adversarial Networks (CM-GANs) with the following contributions. First, a cross-modal GAN architecture is proposed to model joint distribution over the data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both compete with each other to promote cross-modal correlation learning. Second, the cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model They not only exploit the cross-modal correlation for learning the common representations but also preserve reconstruction information for capturing the semantic consistency within each modality. Third, a cross-modal adversarial training mechanism is proposed, which uses two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representations more discriminative by the adversarial training process. In summary, our proposed CM-GAN approach can use GANs to perform cross-modal common representation learning by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of CM-GANs on cross-modal retrieval compared with 13 state-of-the-art methods on 4 cross-modal datasets.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Disentangled Representation Learning for Cross-Modal Biometric Matching
    Ning, Hailong
    Zheng, Xiangtao
    Lu, Xiaoqiang
    Yuan, Yuan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1763 - 1774
  • [42] Integrating information theory and adversarial learning for cross-modal retrieval
    Chen, Wei
    Liu, Yu
    Bakker, Erwin M.
    Lew, Michael S.
    PATTERN RECOGNITION, 2021, 117
  • [43] Adversarial Projection Learning Based Hashing for Cross-Modal Retrieval
    Zeng C.
    Bai C.
    Ma Q.
    Chen S.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (06): : 904 - 912
  • [44] Cross-modal Representation Learning with Nonlinear Dimensionality Reduction
    Kaya, Semih
    Vural, Elif
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [45] Learning Cross-Modal Aligned Representation With Graph Embedding
    Zhang, Youcai
    Cao, Jiayan
    Gu, Xiaodong
    IEEE ACCESS, 2018, 6 : 77321 - 77333
  • [46] Cross-modal Representation Learning for Understanding Manufacturing Procedure
    Hashimoto, Atsushi
    Nishimura, Taichi
    Ushiku, Yoshitaka
    Kameko, Hirotaka
    Mori, Shinsuke
    CROSS-CULTURAL DESIGN-APPLICATIONS IN LEARNING, ARTS, CULTURAL HERITAGE, CREATIVE INDUSTRIES, AND VIRTUAL REALITY, CCD 2022, PT II, 2022, 13312 : 44 - 57
  • [47] Enhanced Multimodal Representation Learning with Cross-modal KD
    Chen, Mengxi
    Xing, Linyu
    Wang, Yu
    Zhang, Ya
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11766 - 11775
  • [48] Towards Cross-Modal Causal Structure and Representation Learning
    Mao, Haiyi
    Liu, Hongfu
    Dou, Jason Xiaotian
    Benos, Panayiotis V.
    MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 120 - 140
  • [49] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
    Hua, Yan
    Du, Jianhe
    PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
  • [50] Variational Deep Representation Learning for Cross-Modal Retrieval
    Yang, Chen
    Deng, Zongyong
    Li, Tianyu
    Liu, Hao
    Liu, Libo
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 498 - 510