CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

被引：204

作者：

Peng, Yuxin ^{[1
]}

Qi, Jinwei ^{[1
]}

机构：

[1] Peking Univ, Inst Comp Sci & Technol, 128th ZhongGuanCun North St, Beijing 100871, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2019年 / 15卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Generative adversarial network; cross-modal adversarial mechanism; common representation learning; cross-modal retrieval;

D O I：

10.1145/3284750

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

It is known that the inconsistent distributions and representations of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate heterogeneous data and measure their similarities. Recently, generative adversarial networks (GANs) have been proposed and have shown their strong ability to model data distribution and learn discriminative representation. It has also been shown that adversarial learning can be fully exploited to learn discriminative common representations for bridging the heterogeneity gap. Inspired by this, we aim to effectively correlate large-scale heterogeneous data of different modalities with the power of GANs to model cross-modal joint distribution. In this article, we propose Cross-modal Generative Adversarial Networks (CM-GANs) with the following contributions. First, a cross-modal GAN architecture is proposed to model joint distribution over the data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both compete with each other to promote cross-modal correlation learning. Second, the cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model They not only exploit the cross-modal correlation for learning the common representations but also preserve reconstruction information for capturing the semantic consistency within each modality. Third, a cross-modal adversarial training mechanism is proposed, which uses two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representations more discriminative by the adversarial training process. In summary, our proposed CM-GAN approach can use GANs to perform cross-modal common representation learning by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of CM-GANs on cross-modal retrieval compared with 13 state-of-the-art methods on 4 cross-modal datasets.

引用

页数：24

共 50 条

[31] Adversarial Learning for Cross-Modal Retrieval with Wasserstein Distance
Cheng, Qingrong
Zhang, Youcai
Gu, Xiaodong
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 16 - 29
[32] UNSUPERVISED CROSS-MODAL RETRIEVAL THROUGH ADVERSARIAL LEARNING
He, Li
Xu, Xing
Lu, Huimin
Yang, Yang
Shen, Fumin
Shen, Heng Tao
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1153 - 1158
[33] Deep adversarial metric learning for cross-modal retrieval
Xu, Xing
He, Li
Lu, Huimin
Gao, Lianli
Ji, Yanli
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 657 - 672
[34] Adversarial Cross-Modal Retrieval
Wang, Bokun
Yang, Yang
Xu, Xing
Hanjalic, Alan
Shen, Heng Tao
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
[35] Cross-modal Adversarial Reprogramming
Neekhara, Paarth
Hussain, Shehzeen
Du, Jinglong
Dubnov, Shlomo
Koushanfar, Farinaz
McAuley, Julian
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2898 - 2906
[36] A Cross-Modal Tactile Reproduction Utilizing Tactile and Visual Information Generated by Conditional Generative Adversarial Networks
Hatori, Koki
Morikura, Takashi
Funahashi, Akira
Takemura, Kenjiro
IEEE ACCESS, 2025, 13 : 9223 - 9229
[37] Generative Adversarial Network Based Asymmetric Deep Cross-Modal Unsupervised Hashing
Cao, Yuan
Gao, Yaru
Chen, Na
Lin, Jiacheng
Chen, Sheng
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT I, 2024, 14487 : 30 - 48
[38] Modality-specific and shared generative adversarial network for cross-modal retrieval
Wu, Fei
Jing, Xiao-Yuan
Wu, Zhiyong
Ji, Yimu
Dong, Xiwei
Luo, Xiaokai
Huang, Qinghua
Wang, Ruchuan
PATTERN RECOGNITION, 2020, 104
[39] GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos
Chen, Kai
Wei, Zhipeng
Chen, Jingjing
Wu, Zuxuan
Jiang, Yu-Gang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 698 - 708
[40] Multi-Pathway Generative Adversarial Hashing for Unsupervised Cross-Modal Retrieval
Zhang, Jian
Peng, Yuxin
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (01) : 174 - 187

← 1 2 3 4 5 →