Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval

被引:2
|
作者
Wu, Hongchang [1 ]
Guan, Ziyu [2 ]
Zhi, Tao [3 ]
zhao, Wei [1 ]
Xu, Cai [2 ]
Han, Hong [2 ]
Yang, Yarning [2 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Peoples R China
[2] Xidian Univ, Xian, Peoples R China
[3] Xidian Univ, Sch Artificial Intelligence, Xian, Peoples R China
关键词
Cross-modal retrieval; graph attention; self attention; generative adversarial network;
D O I
10.1109/ICBK.2019.00043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing cross-modal retrieval methods are mainly constrained to the bimodal case. When applied to the multi-modal case, we need to train O(K-2) (K: number of modalities) separate models, which is inefficient and unable to exploit common information among multiple modalities. Though some studies focused on learning a common space of multiple modalities for retrieval, they assumed data to be i.i.d. and failed to learn the underlying semantic structure which could be important for retrieval. To tackle this issue, we propose an extensive Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval (AGAT). AGAT synthesizes a self-attention network (SAT), a graph attention network (GAT) and a multi-modal generative adversarial network (MGAN). The SAT generates high-level embeddings for data items from different modalities, with self-attention capturing feature-level correlations in each modality. The GAT then uses attention to aggregate embeddings of matched items from different modalities to build a common embedding space. The MGAN aims to "cluster" matched embeddings of different modalities in the common space by forcing them to be similar to the aggregation. Finally, we train the common space so that it captures the semantic structure by constraining within-class/between-class distances. Experiments on three datasets show the effectiveness of AGAT.
引用
收藏
页码:265 / 272
页数:8
相关论文
共 50 条
  • [1] Adversarial Graph Convolutional Network for Cross-Modal Retrieval
    Dong, Xinfeng
    Liu, Li
    Zhu, Lei
    Nie, Liqiang
    Zhang, Huaxiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1634 - 1645
  • [2] Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval
    Zeng, Yawen
    Cao, Da
    Wei, Xiaochi
    Liu, Meng
    Zhao, Zhou
    Qin, Zheng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2215 - 2224
  • [3] Multi-modal Subspace Learning with Joint Graph Regularization for Cross-modal Retrieval
    Wang, Kaiye
    Wang, Wei
    He, Ran
    Wang, Liang
    Tan, Tieniu
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 236 - 240
  • [4] Multi-modal and cross-modal for lecture videos retrieval
    Nhu Van Nguyen
    Coustaty, Mickal
    Ogier, Jean-Marc
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2667 - 2672
  • [5] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Yu, Jun
    Wu, Xiao-Jun
    Zhang, Donglin
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
  • [6] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Jun Yu
    Xiao-Jun Wu
    Donglin Zhang
    Cognitive Computation, 2022, 14 : 1159 - 1171
  • [7] Cross-Modal Retrieval Augmentation for Multi-Modal Classification
    Gur, Shir
    Neverova, Natalia
    Stauffer, Chris
    Lim, Ser-Nam
    Kiela, Douwe
    Reiter, Austin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 111 - 123
  • [8] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [9] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    MEDICAL IMAGE ANALYSIS, 2022, 82
  • [10] Iterative graph attention memory network for cross-modal retrieval
    Dong, Xinfeng
    Zhang, Huaxiang
    Dong, Xiao
    Lu, Xu
    KNOWLEDGE-BASED SYSTEMS, 2021, 226