Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval

被引:0
|
作者
Zheng, Juncheng [1 ]
Liang, Meiyu [1 ]
Yu, Yang [1 ]
Du, Junping [1 ]
Xue, Zhe [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Natl Pilot Software Engn Sch, Beijing Key Lab Intelligent Commun Software & Mul, Beijing 100876, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Fusion; Image-Text Retrieval; Multimodal Konwledge Graph;
D O I
10.1109/BigComp60711.2024.00024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval is a fundamental cross-modal task, which dedicates to align the representation space between image modality and text modality. Existing cross-interactive image-text retrieval methods generate image and sentence embeddings independently, introduce interaction-based networks for cross-modal reasoning, and then retrieve them using matching metrics. However, existing approaches do not consider fully utilizing semantic relationships among multimodal knowledge to enhance cross-modal fine-grained implicit semantic reasoning capabilities. In this paper, we propose Multimodal Knowledge Graph-guided Cross-modal Graph Network (MKCGN) that exploits multimodal knowledge graphs to explore cross-modal relationships and enhance global representations. In MKCGN, images generate semantic and spatial graphs, which are used to represent visual graphs, and sentences generate textual graphs based on word semantic relations. The visual and textual graphs are used to implement inter-modal reasoning respectively. Then we obtain interest embeddings of image regions and text words based on entity embeddings in Multimodal Knowledge Graph (MKG), which approximates and aligns the representation space of regions and words to a certain extent, thus obtaining effective inter-modal interactions and learning fine-grained cross-modal communication through graph node contrast loss for inter-modal semantic reasoning. Finally, we mine the implicit semantics and potential relationships of images and texts through the MKG as a means of enhancing the global representations and use cross-modal contrast loss to narrow the space of coarse-grained cross-modal representations. Experiments on the MS-COCO and Flickr30K benchmark datasets show that our proposed MKCGN outperforms state-of-the-art image-text retrieval methods.
引用
收藏
页码:97 / 100
页数:4
相关论文
共 50 条
  • [41] Knowledge Decomposition and Replay: A Novel Cross-modal Image-text Retrieval Continual Learning Method
    Yang, Rui
    Wang, Shuang
    Zhang, Huan
    Xu, Siyuan
    Guo, YanHe
    Ye, Xiutiao
    Hou, Biao
    Jiao, Licheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6510 - 6519
  • [42] A Graph Model for Cross-modal Retrieval
    Wang, Shixun
    Pan, Peng
    Lu, Yansheng
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 1090 - 1097
  • [43] Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval
    Zhang J.
    Lin Z.
    Jiang X.
    Li M.
    Wang C.
    Multimedia Tools and Applications, 2024, 83 (42) : 90487 - 90509
  • [44] Perceive, Reason, and Align: Context-guided cross-modal correlation learning for image-text retrieval
    Liu, Zheng
    Pei, Xinlei
    Gao, Shanshan
    Li, Changhao
    Wang, Jingyao
    Xu, Junhao
    APPLIED SOFT COMPUTING, 2024, 154
  • [45] Graph Convolutional Network Discrete Hashing for Cross-Modal Retrieval
    Bai, Cong
    Zeng, Chao
    Ma, Qing
    Zhang, Jinglin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4756 - 4767
  • [46] Modality-Fused Graph Network for Cross-Modal Retrieval
    Wu, Fei
    LI, Shuaishuai
    Peng, Guangchuan
    Ma, Yongheng
    Jing, Xiao-Yuan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) : 1094 - 1097
  • [47] Iterative graph attention memory network for cross-modal retrieval
    Dong, Xinfeng
    Zhang, Huaxiang
    Dong, Xiao
    Lu, Xu
    KNOWLEDGE-BASED SYSTEMS, 2021, 226
  • [48] Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval
    Zou, Qiang
    Cheng, Shuli
    Du, Anyu
    Chen, Jiayi
    ENTROPY, 2024, 26 (11)
  • [49] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
  • [50] Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
    Sogi, Naoya
    Shibata, Takashi
    Terao, Makoto
    COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 447 - 464