CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

被引：204

作者：

Peng, Yuxin ^{[1
]}

Qi, Jinwei ^{[1
]}

机构：

[1] Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2019年 / 15卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Generative adversarial network; cross-modal adversarial mechanism; common representation learning; cross-modal retrieval;

D O I：

10.1145/3284750

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

It is known that the inconsistent distributions and representations of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate heterogeneous data and measure their similarities. Recently, generative adversarial networks (GANs) have been proposed and have shown their strong ability to model data distribution and learn discriminative representation. It has also been shown that adversarial learning can be fully exploited to learn discriminative common representations for bridging the heterogeneity gap. Inspired by this, we aim to effectively correlate large-scale heterogeneous data of different modalities with the power of GANs to model cross-modal joint distribution. In this article, we propose Cross-modal Generative Adversarial Networks (CM-GANs) with the following contributions. First, a cross-modal GAN architecture is proposed to model joint distribution over the data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both compete with each other to promote cross-modal correlation learning. Second, the cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model They not only exploit the cross-modal correlation for learning the common representations but also preserve reconstruction information for capturing the semantic consistency within each modality. Third, a cross-modal adversarial training mechanism is proposed, which uses two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representations more discriminative by the adversarial training process. In summary, our proposed CM-GAN approach can use GANs to perform cross-modal common representation learning by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of CM-GANs on cross-modal retrieval compared with 13 state-of-the-art methods on 4 cross-modal datasets.

引用

页数：24

共 50 条

[21] A Cross-Modal Generative Adversarial Network for Scenarios Generation of Renewable Energy
Kang, Mingyu
Zhu, Ran
Chen, Duxin
Li, Chaojie
Gu, Wei
Qian, Xusheng
Yu, Wenwu
IEEE TRANSACTIONS ON POWER SYSTEMS, 2024, 39 (02) : 2630 - 2640
[22] Quaternion Representation Learning for cross-modal matching
Wang, Zheng
Xu, Xing
Wei, Jiwei
Xie, Ning
Shao, Jie
Yang, Yang
KNOWLEDGE-BASED SYSTEMS, 2023, 270
[23] Hybrid representation learning for cross-modal retrieval
Cao, Wenming
Lin, Qiubin
He, Zhihai
He, Zhiquan
NEUROCOMPUTING, 2019, 345 : 45 - 57
[24] Category Alignment Adversarial Learning for Cross-Modal Retrieval
He, Shiyuan
Wang, Weiyang
Wang, Zheng
Xu, Xing
Yang, Yang
Wang, Xiaoming
Shen, Heng Tao
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4527 - 4538
[25] Adversarial cross-modal retrieval based on dictionary learning
Shang, Fei
Zhang, Huaxiang
Zhu, Lei
Sun, Jiande
NEUROCOMPUTING, 2019, 355 : 93 - 104
[26] Independency Adversarial Learning for Cross-Modal Sound Separation
Lin, Zhenkai
Ji, Yanli
Yang, Yang
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3522 - 3530
[27] Adaptive Adversarial Learning based cross-modal retrieval
Li, Zhuoyi
Lu, Huibin
Fu, Hao
Wang, Zhongrui
Gu, Guanghun
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
[28] Dual Subspaces with Adversarial Learning for Cross-Modal Retrieval
Xia, Yaxian
Wang, Wenmin
Han, Liang
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 654 - 663
[29] Deep adversarial metric learning for cross-modal retrieval
Xing Xu
Li He
Huimin Lu
Lianli Gao
Yanli Ji
World Wide Web, 2019, 22 : 657 - 672
[30] Cross-modal dual subspace learning with adversarial network
Shang, Fei
Zhang, Huaxiang
Sun, Jiande
Nie, Liqiang
Liu, Li
NEURAL NETWORKS, 2020, 126 : 132 - 142

← 1 2 3 4 5 →