Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval

被引：0

作者：

Zheng, Juncheng ^{[1
]}

Liang, Meiyu ^{[1
]}

Yu, Yang ^{[1
]}

Du, Junping ^{[1
]}

Xue, Zhe ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Natl Pilot Software Engn Sch, Beijing Key Lab Intelligent Commun Software & Mul, Beijing 100876, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Multimodal Fusion; Image-Text Retrieval; Multimodal Konwledge Graph;

D O I：

10.1109/BigComp60711.2024.00024

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image-text retrieval is a fundamental cross-modal task, which dedicates to align the representation space between image modality and text modality. Existing cross-interactive image-text retrieval methods generate image and sentence embeddings independently, introduce interaction-based networks for cross-modal reasoning, and then retrieve them using matching metrics. However, existing approaches do not consider fully utilizing semantic relationships among multimodal knowledge to enhance cross-modal fine-grained implicit semantic reasoning capabilities. In this paper, we propose Multimodal Knowledge Graph-guided Cross-modal Graph Network (MKCGN) that exploits multimodal knowledge graphs to explore cross-modal relationships and enhance global representations. In MKCGN, images generate semantic and spatial graphs, which are used to represent visual graphs, and sentences generate textual graphs based on word semantic relations. The visual and textual graphs are used to implement inter-modal reasoning respectively. Then we obtain interest embeddings of image regions and text words based on entity embeddings in Multimodal Knowledge Graph (MKG), which approximates and aligns the representation space of regions and words to a certain extent, thus obtaining effective inter-modal interactions and learning fine-grained cross-modal communication through graph node contrast loss for inter-modal semantic reasoning. Finally, we mine the implicit semantics and potential relationships of images and texts through the MKG as a means of enhancing the global representations and use cross-modal contrast loss to narrow the space of coarse-grained cross-modal representations. Experiments on the MS-COCO and Flickr30K benchmark datasets show that our proposed MKCGN outperforms state-of-the-art image-text retrieval methods.

引用

页码：97 / 100

页数：4

共 50 条

[1] Cross-modal Graph Matching Network for Image-text Retrieval
Cheng, Yuhao
Zhu, Xiaoguang
Qian, Jiuchao
Wen, Fei
Liu, Peilin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
[2] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
Qin, Xueyang
Li, Lishuang
Pang, Guangyao
Hao, Fei
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[3] Cross-modal alignment with graph reasoning for image-text retrieval
Zheng Cui
Yongli Hu
Yanfeng Sun
Junbin Gao
Baocai Yin
Multimedia Tools and Applications, 2022, 81 : 23615 - 23632
[4] Cross-modal alignment with graph reasoning for image-text retrieval
Cui, Zheng
Hu, Yongli
Sun, Yanfeng
Gao, Junbin
Yin, Baocai
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 23615 - 23632
[5] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
Ji, Zhong
Wang, Haoran
Han, Jungong
Pang, Yanwei
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097
[6] Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Wang, Sijin
Wang, Ruiping
Yao, Ziwei
Shan, Shiguang
Chen, Xilin
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1497 - 1506
[7] Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval
Seo, Sanghyun
Kim, Juntae
PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 350 - 353
[8] Cross-modal independent matching network for image-text retrieval
Ke, Xiao
Chen, Baitao
Yang, Xiong
Cai, Yuhang
Liu, Hao
Guo, Wenzhong
PATTERN RECOGNITION, 2025, 159
[9] Cross Attention Graph Matching Network for Image-Text Retrieval
Yang, Xiaoyu
Xie, Hao
Mao, Junyi
Wang, Zhiguo
Yin, Guangqiang
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
[10] Multimodal Graph Learning for Cross-Modal Retrieval
Xie, Jingyou
Zhao, Zishuo
Lin, Zhenzhou
Shen, Ying
PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153

← 1 2 3 4 5 →