Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval

被引:26
|
作者
Cheng, Qingrong [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Common space learning; Cross-modal graph; Graph representation learning network; Feature transfer learning network; Graph embedding;
D O I
10.1016/j.neunet.2020.11.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap'' among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap,'' the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap'' among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap'' in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页码:143 / 162
页数:20
相关论文
共 50 条
  • [21] Kernelized Cross-Modal Hashing for Multimedia Retrieval
    Tan, Shoubiao
    Hu, Lingyu
    Wang-Xu, Anqi
    Tang, Jun
    Jia, Zhaohong
    PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 1224 - 1228
  • [22] CONTINUUM REGRESSION FOR CROSS-MODAL MULTIMEDIA RETRIEVAL
    Chen, Yongming
    Wang, Liang
    Wang, Wei
    Zhang, Zhang
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1949 - 1952
  • [23] Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning
    Guerrero, Ricardo
    Pham, Hai X.
    Pavlovic, Vladimir
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3192 - 3201
  • [24] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [25] Cross-Modal Retrieval with Improved Graph Convolution
    Hongtu, Zhang
    Chunjian, Hua
    Yi, Jiang
    Jianfeng, Yu
    Ying, Chen
    Computer Engineering and Applications, 2024, 60 (11) : 95 - 104
  • [26] Cross-Modal Retrieval with Heterogeneous Graph Embedding
    Chen, Dapeng
    Wang, Min
    Chen, Haobin
    Wu, Lin
    Qin, Jing
    Peng, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300
  • [27] Learning to rank with relational graph and pointwise constraint for cross-modal retrieval
    Qingzhen Xu
    Miao Li
    Mengjing Yu
    Soft Computing, 2019, 23 : 9413 - 9427
  • [28] Learning to rank with relational graph and pointwise constraint for cross-modal retrieval
    Xu, Qingzhen
    Li, Miao
    Yu, Mengjing
    SOFT COMPUTING, 2019, 23 (19) : 9413 - 9427
  • [29] Cross-Modal Guided Visual Representation Learning for Social Image Retrieval
    Guan, Ziyu
    Zhao, Wanqing
    Liu, Hongmin
    Nakashima, Yuta
    Noboru, Babaguchi
    He, Xiaofei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 2186 - 2198
  • [30] Multi-modal Subspace Learning with Joint Graph Regularization for Cross-modal Retrieval
    Wang, Kaiye
    Wang, Wei
    He, Ran
    Wang, Liang
    Tan, Tieniu
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 236 - 240