CGNN: Caption-assisted graph neural network for image-text retrieval

被引:3
|
作者
Hu, Yongli [1 ]
Zhang, Hanfu [1 ]
Jiang, Huajie [1 ,2 ]
Bi, Yandong [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Beijing 100124, Peoples R China
关键词
Image -text retrieval; Cross -modal retrieval; Image captioning; Graph convolution;
D O I
10.1016/j.patrec.2022.08.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval has drawn much attention in recent years, where similarity measure between im-age and text plays an important role. Most existing works focus on learning global coarse-grained or local fine-grained features for similarity computation. However, the large domain gap between different modalities is often neglected, which makes it difficult to match the images and texts effectively. In order to deal with this problem, we propose to use auxiliary information to release the domain gap, where the image captions are generated. Then, a Caption-Assisted Graph Neural Network(CGNN) is designed to learn the structured relationships among images, captions, and texts. Since the captions and the texts are from the same domain, the domain gap between images and texts can be effectively released. With the help of caption information, our model achieves excellent performance on two cross-modal retrieval datasets, Flickr30K and MS-COCO, which shows the effectiveness of our framework.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:137 / 142
页数:6
相关论文
共 50 条
  • [31] Dynamic Contrastive Distillation for Image-Text Retrieval
    Rao, Jun
    Ding, Liang
    Qi, Shuhan
    Fang, Meng
    Liu, Yang
    Shen, Li
    Tao, Dacheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8383 - 8395
  • [32] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [33] Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval
    Qin, Xue-Yang
    Li, Li-Shuang
    Tang, Jing-Yao
    Hao, Fei
    Ge, Mei-Ling
    Pang, Guang-Yao
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 811 - 826
  • [34] CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval
    Wen, Xin
    Han, Zhizhong
    Liu, Yu-Shen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2427 - 2437
  • [35] Image-text bidirectional learning network based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Gu, Guanghua
    NEUROCOMPUTING, 2022, 483 : 148 - 159
  • [36] Multi-Layer Probabilistic Association Reasoning Network for Image-Text Retrieval
    Li, Wenrui
    Xiong, Ruiqin
    Fan, Xiaopeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9706 - 9717
  • [37] Global-aware Fragment Representation Aggregation Network for image-text retrieval
    Wang, Di
    Tian, Jiabo
    Liang, Xiao
    Tian, Yumin
    He, Lihuo
    PATTERN RECOGNITION, 2025, 159
  • [38] Learning Aligned Image-Text Representations Using Graph Attentive Relational Network
    Jing, Ya
    Wang, Wei
    Wang, Liang
    Tan, Tieniu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1840 - 1852
  • [39] Entity Semantic Feature Fusion Network for Remote Sensing Image-Text Retrieval
    Shui, Jianan
    Ding, Shuaipeng
    Li, Mingyong
    Ma, Yan
    WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 130 - 145
  • [40] Multi-scale motivated neural network for image-text matching
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4383 - 4407