Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval

被引：26

作者：

Cheng, Qingrong ^{[1
]}

Gu, Xiaodong ^{[1
]}

机构：

[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China

来源：

NEURAL NETWORKS | 2021年 / 134卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal retrieval; Common space learning; Cross-modal graph; Graph representation learning network; Feature transfer learning network; Graph embedding;

D O I：

10.1016/j.neunet.2020.11.011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap'' among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap,'' the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap'' among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap'' in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods. (C) 2020 Elsevier Ltd. All rights reserved.

引用

页码：143 / 162

页数：20

共 50 条

[31] Content-based multimedia information retrieval via cross-modal querying
Li, MK
Li, DG
Dimitrova, N
Sethi, IK
8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL X, PROCEEDINGS: SYSTEMICS AND INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 141 - 145
[32] Continual learning in cross-modal retrieval
Wang, Kai
Herranz, Luis
van de Weijer, Joost
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
[33] Learning DALTS for cross-modal retrieval
Yu, Zheng
Wang, Wenmin
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
[34] Sequential Learning for Cross-modal Retrieval
Song, Ge
Tan, Xiaoyang
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
[35] Deep Semantic Correlation Learning based Hashing for Multimedia Cross-Modal Retrieval
Gong, Xiaolong
Huang, Linpeng
Wang, Fuwei
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 117 - 126
[36] Combining Link and Content Correlation Learning for Cross-Modal Retrieval in Social Multimedia
Zhang, Longtao
Liu, Fangfang
Zeng, Zhimin
HUMAN CENTERED COMPUTING, HCC 2017, 2018, 10745 : 516 - 526
[37] Discrete Cross-Modal Hashing for Efficient Multimedia Retrieval
Ma, Dekui
Liang, Jian
Kong, Xiangwei
He, Ran
Li, Ying
PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 38 - 43
[38] On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval
Costa Pereira, Jose
Coviello, Emanuele
Doyle, Gabriel
Rasiwasia, Nikhil
Lanckriet, Gert R. G.
Levy, Roger
Vasconcelos, Nuno
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) : 521 - 535
[39] Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval
Cheng, Miaomiao
Jing, Liping
Ng, Michael K.
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (03)
[40] Cross-Modal Learning to Rank via Latent Joint Representation
Wu, Fei
Jiang, Xinyang
Li, Xi
Tang, Siliang
Lu, Weiming
Zhang, Zhongfei
Zhuang, Yueting
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (05) : 1497 - 1509

← 1 2 3 4 5 →