Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval

被引:26
|
作者
Cheng, Qingrong [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Common space learning; Cross-modal graph; Graph representation learning network; Feature transfer learning network; Graph embedding;
D O I
10.1016/j.neunet.2020.11.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap'' among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap,'' the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap'' among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap'' in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页码:143 / 162
页数:20
相关论文
共 50 条
  • [31] Content-based multimedia information retrieval via cross-modal querying
    Li, MK
    Li, DG
    Dimitrova, N
    Sethi, IK
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL X, PROCEEDINGS: SYSTEMICS AND INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 141 - 145
  • [32] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
  • [33] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
  • [34] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [35] Deep Semantic Correlation Learning based Hashing for Multimedia Cross-Modal Retrieval
    Gong, Xiaolong
    Huang, Linpeng
    Wang, Fuwei
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 117 - 126
  • [36] Combining Link and Content Correlation Learning for Cross-Modal Retrieval in Social Multimedia
    Zhang, Longtao
    Liu, Fangfang
    Zeng, Zhimin
    HUMAN CENTERED COMPUTING, HCC 2017, 2018, 10745 : 516 - 526
  • [37] Discrete Cross-Modal Hashing for Efficient Multimedia Retrieval
    Ma, Dekui
    Liang, Jian
    Kong, Xiangwei
    He, Ran
    Li, Ying
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 38 - 43
  • [38] On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval
    Costa Pereira, Jose
    Coviello, Emanuele
    Doyle, Gabriel
    Rasiwasia, Nikhil
    Lanckriet, Gert R. G.
    Levy, Roger
    Vasconcelos, Nuno
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) : 521 - 535
  • [39] Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval
    Cheng, Miaomiao
    Jing, Liping
    Ng, Michael K.
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (03)
  • [40] Cross-Modal Learning to Rank via Latent Joint Representation
    Wu, Fei
    Jiang, Xinyang
    Li, Xi
    Tang, Siliang
    Lu, Weiming
    Zhang, Zhongfei
    Zhuang, Yueting
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (05) : 1497 - 1509