CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

被引：204

作者：

Peng, Yuxin ^{[1
]}

Qi, Jinwei ^{[1
]}

机构：

[1] Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2019年 / 15卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Generative adversarial network; cross-modal adversarial mechanism; common representation learning; cross-modal retrieval;

D O I：

10.1145/3284750

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

It is known that the inconsistent distributions and representations of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate heterogeneous data and measure their similarities. Recently, generative adversarial networks (GANs) have been proposed and have shown their strong ability to model data distribution and learn discriminative representation. It has also been shown that adversarial learning can be fully exploited to learn discriminative common representations for bridging the heterogeneity gap. Inspired by this, we aim to effectively correlate large-scale heterogeneous data of different modalities with the power of GANs to model cross-modal joint distribution. In this article, we propose Cross-modal Generative Adversarial Networks (CM-GANs) with the following contributions. First, a cross-modal GAN architecture is proposed to model joint distribution over the data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both compete with each other to promote cross-modal correlation learning. Second, the cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model They not only exploit the cross-modal correlation for learning the common representations but also preserve reconstruction information for capturing the semantic consistency within each modality. Third, a cross-modal adversarial training mechanism is proposed, which uses two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representations more discriminative by the adversarial training process. In summary, our proposed CM-GAN approach can use GANs to perform cross-modal common representation learning by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of CM-GANs on cross-modal retrieval compared with 13 state-of-the-art methods on 4 cross-modal datasets.

引用

页数：24

共 50 条

[1] Learning cross-modal visual-tactile representation using ensembled generative adversarial networks
Li, Xinwu
Liu, Huaping
Zhou, Junfeng
Sun, FuChun
COGNITIVE COMPUTATION AND SYSTEMS, 2019, 1 (02) : 40 - 44
[2] Representation separation adversarial networks for cross-modal retrieval
Deng, Jiaxin
Ou, Weihua
Gou, Jianping
Song, Heping
Wang, Anzhi
Xu, Xing
WIRELESS NETWORKS, 2024, 30 (05) : 3469 - 3481
[3] SYNCGAN: SYNCHRONIZE THE LATENT SPACES OF CROSS-MODAL GENERATIVE ADVERSARIAL NETWORKS
Chen, Wen-Cheng
Chen, Chien-Wen
Hu, Min-Chun
2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
[4] Unsupervised Generative Adversarial Cross-Modal Hashing
Zhang, Jian
Peng, Yuxin
Yuan, Mingkuan
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 539 - 546
[5] Cross-Modal Search for Social Networks via Adversarial Learning
Zhou, Nan
Du, Junping
Xue, Zhe
Liu, Chong
Li, Jinxuan
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020
[6] Semi-supervised cross-modal image generation with generative adversarial networks
Li, Dan
Du, Changde
He, Huiguang
PATTERN RECOGNITION, 2020, 100
[7] Cross-Modal Learning with Adversarial Samples
Li, Chao
Deng, Cheng
Gao, Shangqian
Xie, De
Liu, Wei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[8] Common Semantic Representation Method Based on Object Attention and Adversarial Learning for Cross-Modal Data in IoV
Kou, Feifei
Du, Junping
Cui, Wanqiu
Shi, Lei
Cheng, Pengchao
Chen, Jiannan
Li, Jinxuan
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (12) : 11588 - 11598
[9] Cross-modal Common Representation Learning by Hybrid Transfer Network
Huang, Xin
Peng, Yuxin
Yuan, Mingkuan
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1893 - 1900
[10] Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis
Tan, Hongchen
Liu, Xiuping
Yin, Baocai
Li, Xin
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 832 - 845

← 1 2 3 4 5 →