Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval

被引:2
|
作者
Wu, Hongchang [1 ]
Guan, Ziyu [2 ]
Zhi, Tao [3 ]
zhao, Wei [1 ]
Xu, Cai [2 ]
Han, Hong [2 ]
Yang, Yarning [2 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Peoples R China
[2] Xidian Univ, Xian, Peoples R China
[3] Xidian Univ, Sch Artificial Intelligence, Xian, Peoples R China
关键词
Cross-modal retrieval; graph attention; self attention; generative adversarial network;
D O I
10.1109/ICBK.2019.00043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing cross-modal retrieval methods are mainly constrained to the bimodal case. When applied to the multi-modal case, we need to train O(K-2) (K: number of modalities) separate models, which is inefficient and unable to exploit common information among multiple modalities. Though some studies focused on learning a common space of multiple modalities for retrieval, they assumed data to be i.i.d. and failed to learn the underlying semantic structure which could be important for retrieval. To tackle this issue, we propose an extensive Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval (AGAT). AGAT synthesizes a self-attention network (SAT), a graph attention network (GAT) and a multi-modal generative adversarial network (MGAN). The SAT generates high-level embeddings for data items from different modalities, with self-attention capturing feature-level correlations in each modality. The GAT then uses attention to aggregate embeddings of matched items from different modalities to build a common embedding space. The MGAN aims to "cluster" matched embeddings of different modalities in the common space by forcing them to be similar to the aggregation. Finally, we train the common space so that it captures the semantic structure by constraining within-class/between-class distances. Experiments on three datasets show the effectiveness of AGAT.
引用
收藏
页码:265 / 272
页数:8
相关论文
共 50 条
  • [21] GRAPH PATTERN LOSS BASED DIVERSIFIED ATTENTION NETWORK FOR CROSS-MODAL RETRIEVAL
    Chen, Xueying
    Zhang, Rong
    Zhan, Yibing
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2391 - 2395
  • [22] Graph Convolutional Network Hashing for Cross-Modal Retrieval
    Xu, Ruiqing
    Li, Chao
    Yan, Junchi
    Deng, Cheng
    Liu, Xianglong
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 982 - 988
  • [23] Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval
    Qian, Shengsheng
    Xue, Dizhan
    Zhang, Huaiwen
    Fang, Quan
    Xu, Changsheng
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2440 - 2448
  • [24] DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal Retrieval
    Cai, Liewu
    Zhu, Lei
    Zhang, Hongyan
    Zhu, Xinghui
    FUTURE INTERNET, 2022, 14 (02)
  • [25] Multi-modal Subspace Learning with Dropout regularization for Cross-modal Recognition and Retrieval
    Cao, Guanqun
    Waris, Muhammad Adeel
    Iosifidis, Alexandros
    Gabbouj, Moncef
    2016 SIXTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2016,
  • [26] Cross-Modal Graph Attention Network for Entity Alignment
    Xu, Baogui
    Xu, Chengjin
    Su, Bing
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3715 - 3723
  • [27] Complementarity is the king: Multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval
    Pei, Xinlei
    Liu, Zheng
    Gao, Shanshan
    Su, Yijun
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 216
  • [28] Information Aggregation Semantic Adversarial Network for Cross-Modal Retrieval
    Wang, Hongfei
    Feng, Aimin
    Liu, Xuejun
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [29] Adversarial Modality Alignment Network for Cross-Modal Molecule Retrieval
    Zhao W.
    Zhou D.
    Cao B.
    Zhang K.
    Chen J.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (01): : 278 - 289
  • [30] A Graph Model for Cross-modal Retrieval
    Wang, Shixun
    Pan, Peng
    Lu, Yansheng
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 1090 - 1097