CGNN: Caption-assisted graph neural network for image-text retrieval

被引:3
|
作者
Hu, Yongli [1 ]
Zhang, Hanfu [1 ]
Jiang, Huajie [1 ,2 ]
Bi, Yandong [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Beijing 100124, Peoples R China
关键词
Image -text retrieval; Cross -modal retrieval; Image captioning; Graph convolution;
D O I
10.1016/j.patrec.2022.08.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval has drawn much attention in recent years, where similarity measure between im-age and text plays an important role. Most existing works focus on learning global coarse-grained or local fine-grained features for similarity computation. However, the large domain gap between different modalities is often neglected, which makes it difficult to match the images and texts effectively. In order to deal with this problem, we propose to use auxiliary information to release the domain gap, where the image captions are generated. Then, a Caption-Assisted Graph Neural Network(CGNN) is designed to learn the structured relationships among images, captions, and texts. Since the captions and the texts are from the same domain, the domain gap between images and texts can be effectively released. With the help of caption information, our model achieves excellent performance on two cross-modal retrieval datasets, Flickr30K and MS-COCO, which shows the effectiveness of our framework.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:137 / 142
页数:6
相关论文
共 50 条
  • [21] A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
    Manh-Duy Nguyen
    Binh T Nguyen
    Cathal Gurrin
    NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2021, 337 : 510 - 523
  • [22] Prototype local-global alignment network for image-text retrieval
    Meng, Lingtao
    Zhang, Feifei
    Zhang, Xi
    Xu, Changsheng
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 525 - 538
  • [23] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
    Wang, Shuhuai
    Liu, Zheng
    Pei, Xinlei
    Xu, Junhao
    SENSORS, 2023, 23 (05)
  • [24] Cross-modal independent matching network for image-text retrieval
    Ke, Xiao
    Chen, Baitao
    Yang, Xiong
    Cai, Yuhang
    Liu, Hao
    Guo, Wenzhong
    PATTERN RECOGNITION, 2025, 159
  • [25] Global Relation-Aware Attention Network for Image-Text Retrieval
    Cao, Jie
    Qian, Shengsheng
    Zhang, Huaiwen
    Fang, Quan
    Xu, Changsheng
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 19 - 28
  • [26] EENet: embedding enhancement network for compositional image-text retrieval using generated text
    Hur, Chan
    Park, Hyeyoung
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 49689 - 49705
  • [27] EENet: embedding enhancement network for compositional image-text retrieval using generated text
    Chan Hur
    Hyeyoung Park
    Multimedia Tools and Applications, 2024, 83 : 49689 - 49705
  • [28] Compositional Learning of Image-Text Query for Image Retrieval
    Anwaar, Muhammad Umer
    Labintcev, Egor
    Kleinsteuber, Martin
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1139 - 1148
  • [29] Kernel triplet loss for image-text retrieval
    Pan, Zhengxin
    Wu, Fangyu
    Zhang, Bailing
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [30] Reservoir Computing Transformer for Image-Text Retrieval
    Li, Wenrui
    Ma, Zhengyu
    Deng, Liang-Jian
    Wang, Penghong
    Shi, Jinqiao
    Fan, Xiaopeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5605 - 5613