Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval

被引：0

作者：

Zheng, Juncheng ^{[1
]}

Liang, Meiyu ^{[1
]}

Yu, Yang ^{[1
]}

Du, Junping ^{[1
]}

Xue, Zhe ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Natl Pilot Software Engn Sch, Beijing Key Lab Intelligent Commun Software & Mul, Beijing 100876, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Multimodal Fusion; Image-Text Retrieval; Multimodal Konwledge Graph;

D O I：

10.1109/BigComp60711.2024.00024

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image-text retrieval is a fundamental cross-modal task, which dedicates to align the representation space between image modality and text modality. Existing cross-interactive image-text retrieval methods generate image and sentence embeddings independently, introduce interaction-based networks for cross-modal reasoning, and then retrieve them using matching metrics. However, existing approaches do not consider fully utilizing semantic relationships among multimodal knowledge to enhance cross-modal fine-grained implicit semantic reasoning capabilities. In this paper, we propose Multimodal Knowledge Graph-guided Cross-modal Graph Network (MKCGN) that exploits multimodal knowledge graphs to explore cross-modal relationships and enhance global representations. In MKCGN, images generate semantic and spatial graphs, which are used to represent visual graphs, and sentences generate textual graphs based on word semantic relations. The visual and textual graphs are used to implement inter-modal reasoning respectively. Then we obtain interest embeddings of image regions and text words based on entity embeddings in Multimodal Knowledge Graph (MKG), which approximates and aligns the representation space of regions and words to a certain extent, thus obtaining effective inter-modal interactions and learning fine-grained cross-modal communication through graph node contrast loss for inter-modal semantic reasoning. Finally, we mine the implicit semantics and potential relationships of images and texts through the MKG as a means of enhancing the global representations and use cross-modal contrast loss to narrow the space of coarse-grained cross-modal representations. Experiments on the MS-COCO and Flickr30K benchmark datasets show that our proposed MKCGN outperforms state-of-the-art image-text retrieval methods.

引用

页码：97 / 100

页数：4

共 50 条

[41] Knowledge Decomposition and Replay: A Novel Cross-modal Image-text Retrieval Continual Learning Method
Yang, Rui
Wang, Shuang
Zhang, Huan
Xu, Siyuan
Guo, YanHe
Ye, Xiutiao
Hou, Biao
Jiao, Licheng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6510 - 6519
[42] A Graph Model for Cross-modal Retrieval
Wang, Shixun
Pan, Peng
Lu, Yansheng
PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 1090 - 1097
[43] Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval
Zhang J.
Lin Z.
Jiang X.
Li M.
Wang C.
Multimedia Tools and Applications, 2024, 83 (42) : 90487 - 90509
[44] Perceive, Reason, and Align: Context-guided cross-modal correlation learning for image-text retrieval
Liu, Zheng
Pei, Xinlei
Gao, Shanshan
Li, Changhao
Wang, Jingyao
Xu, Junhao
APPLIED SOFT COMPUTING, 2024, 154
[45] Graph Convolutional Network Discrete Hashing for Cross-Modal Retrieval
Bai, Cong
Zeng, Chao
Ma, Qing
Zhang, Jinglin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4756 - 4767
[46] Modality-Fused Graph Network for Cross-Modal Retrieval
Wu, Fei
LI, Shuaishuai
Peng, Guangchuan
Ma, Yongheng
Jing, Xiao-Yuan
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) : 1094 - 1097
[47] Iterative graph attention memory network for cross-modal retrieval
Dong, Xinfeng
Zhang, Huaxiang
Dong, Xiao
Lu, Xu
KNOWLEDGE-BASED SYSTEMS, 2021, 226
[48] Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval
Zou, Qiang
Cheng, Shuli
Du, Anyu
Chen, Jiayi
ENTROPY, 2024, 26 (11)
[49] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
Liu, Xiaoqing
Zeng, Huanqiang
Shi, Yifan
Zhu, Jianqing
Ma, Kai-Kuang
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
[50] Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
Sogi, Naoya
Shibata, Takashi
Terao, Makoto
COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 447 - 464

← 1 2 3 4 5 →