Unifying knowledge iterative dissemination and relational reconstruction network for image-text matching

被引:24
|
作者
Xie, Xiumin [1 ]
Li, Zhixin [1 ]
Tang, Zhenjun [1 ]
Yao, Dan [1 ]
Ma, Huifang [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
Image-text matching; Semantic knowledge; Similarity representation learning; Similarity-relation learning; Graph neural network; ATTENTION;
D O I
10.1016/j.ipm.2022.103154
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-text matching is a crucial branch in multimedia retrieval which relies on learning inter-modal correspondences. Most existing methods focus on global or local correspondence and fail to explore fine-grained global-local alignment. Moreover, the issue of how to infer more accurate similarity scores remains unresolved. In this study, we propose a novel unifying knowledge iterative dissemination and relational reconstruction (KIDRR) network for image-text matching. Particularly, the knowledge graph iterative dissemination module is designed to iteratively broadcast global semantic knowledge, enabling relevant nodes to be associated, resulting in fine-grained intra-modal correlations and features. Hence, vectorbased similarity representations are learned from multiple perspectives to model multi-level alignments comprehensively. The relation graph reconstruction module is further developed to enhance cross-modal correspondences by constructing similarity relation graphs and adaptively reconstructing them. We conducted experiments on the datasets Flickr30K and MSCOCO, which have 31,783 and 123,287 images, respectively. Experiments show that KIDRR achieves improvements of nearly 2.2% and 1.6% relative to Recall@1 on Flicr30K and MSCOCO, respectively, compared to the current state-of-the-art baselines.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Multi-Modal Memory Enhancement Attention Network for Image-Text Matching
    Ji, Zhong
    Lin, Zhigang
    Wang, Haoran
    He, Yuqing
    IEEE ACCESS, 2020, 8 : 38438 - 38447
  • [42] ATTEND, CORRECT AND FOCUS: A BIDIRECTIONAL CORRECT ATTENTION NETWORK FOR IMAGE-TEXT MATCHING
    Liu, Yang
    Wang, Huaqiu
    Meng, Fanyang
    Liu, Mengyuan
    Liu, Hong
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2673 - 2677
  • [43] Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching
    Liu, Chunxiao
    Mao, Zhendong
    Liu, An-An
    Zhang, Tianzhu
    Wang, Bin
    Zhang, Yongdong
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 3 - 11
  • [44] FA-IATI: A Framework of Frequency Adaptive and Iterative Attention Interaction for Image-Text Matching
    Qin, Youxuan
    Zhao, Jing
    Li, Ming
    Sun, Chao
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [45] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
    Chen, Hui
    Ding, Guiguang
    Liu, Xudong
    Lin, Zijia
    Liu, Ji
    Han, Jungong
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12652 - 12660
  • [46] MiC: Image-text Matching in Circles with cross-modal generative knowledge enhancement
    Pu, Xiao
    Chen, Yuwen
    Yuan, Lin
    Zhang, Yan
    Li, Hongbo
    Jing, Liping
    Gao, Xinbo
    KNOWLEDGE-BASED SYSTEMS, 2024, 289
  • [47] Hierarchical Knowledge-Based Graph Embedding Model for Image-Text Matching in IoTs
    Zhang, Lizong
    Li, Meng
    Yan, Ke
    Wang, Ruozhou
    Hui, Bei
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12) : 9399 - 9409
  • [48] Relational Distant Supervision for Image Captioning without Image-Text Pairs
    Qi, Yayun
    Zhao, Wentian
    Wu, Xinxiao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4524 - 4532
  • [49] Hashing based Efficient Inference for Image-Text Matching
    Tu, Rong-Cheng
    Ji, Lei
    Luo, Huaishao
    Shi, Botian
    Huang, Heyan
    Duan, Nan
    Mao, Xian-Ling
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 743 - 752
  • [50] Towards Deconfounded Image-Text Matching with Causal Inference
    Li, Wenhui
    Su, Xinqi
    Song, Dan
    Wang, Lanjun
    Zhang, Kun
    Liu, An-An
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6264 - 6273