CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

被引：204

作者：

Peng, Yuxin ^{[1
]}

Qi, Jinwei ^{[1
]}

机构：

[1] Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2019年 / 15卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Generative adversarial network; cross-modal adversarial mechanism; common representation learning; cross-modal retrieval;

D O I：

10.1145/3284750

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

It is known that the inconsistent distributions and representations of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate heterogeneous data and measure their similarities. Recently, generative adversarial networks (GANs) have been proposed and have shown their strong ability to model data distribution and learn discriminative representation. It has also been shown that adversarial learning can be fully exploited to learn discriminative common representations for bridging the heterogeneity gap. Inspired by this, we aim to effectively correlate large-scale heterogeneous data of different modalities with the power of GANs to model cross-modal joint distribution. In this article, we propose Cross-modal Generative Adversarial Networks (CM-GANs) with the following contributions. First, a cross-modal GAN architecture is proposed to model joint distribution over the data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both compete with each other to promote cross-modal correlation learning. Second, the cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model They not only exploit the cross-modal correlation for learning the common representations but also preserve reconstruction information for capturing the semantic consistency within each modality. Third, a cross-modal adversarial training mechanism is proposed, which uses two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representations more discriminative by the adversarial training process. In summary, our proposed CM-GAN approach can use GANs to perform cross-modal common representation learning by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of CM-GANs on cross-modal retrieval compared with 13 state-of-the-art methods on 4 cross-modal datasets.

引用

页数：24

共 50 条

[41] Disentangled Representation Learning for Cross-Modal Biometric Matching
Ning, Hailong
Zheng, Xiangtao
Lu, Xiaoqiang
Yuan, Yuan
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1763 - 1774
[42] Integrating information theory and adversarial learning for cross-modal retrieval
Chen, Wei
Liu, Yu
Bakker, Erwin M.
Lew, Michael S.
PATTERN RECOGNITION, 2021, 117
[43] Adversarial Projection Learning Based Hashing for Cross-Modal Retrieval
Zeng C.
Bai C.
Ma Q.
Chen S.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (06): : 904 - 912
[44] Cross-modal Representation Learning with Nonlinear Dimensionality Reduction
Kaya, Semih
Vural, Elif
2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
[45] Learning Cross-Modal Aligned Representation With Graph Embedding
Zhang, Youcai
Cao, Jiayan
Gu, Xiaodong
IEEE ACCESS, 2018, 6 : 77321 - 77333
[46] Cross-modal Representation Learning for Understanding Manufacturing Procedure
Hashimoto, Atsushi
Nishimura, Taichi
Ushiku, Yoshitaka
Kameko, Hirotaka
Mori, Shinsuke
CROSS-CULTURAL DESIGN-APPLICATIONS IN LEARNING, ARTS, CULTURAL HERITAGE, CREATIVE INDUSTRIES, AND VIRTUAL REALITY, CCD 2022, PT II, 2022, 13312 : 44 - 57
[47] Enhanced Multimodal Representation Learning with Cross-modal KD
Chen, Mengxi
Xing, Linyu
Wang, Yu
Zhang, Ya
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11766 - 11775
[48] Towards Cross-Modal Causal Structure and Representation Learning
Mao, Haiyi
Liu, Hongfu
Dou, Jason Xiaotian
Benos, Panayiotis V.
MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 120 - 140
[49] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
Hua, Yan
Du, Jianhe
PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
[50] Variational Deep Representation Learning for Cross-Modal Retrieval
Yang, Chen
Deng, Zongyong
Li, Tianyu
Liu, Hao
Liu, Libo
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 498 - 510

← 1 2 3 4 5 →