CGNN: Caption-assisted graph neural network for image-text retrieval

被引:3
|
作者
Hu, Yongli [1 ]
Zhang, Hanfu [1 ]
Jiang, Huajie [1 ,2 ]
Bi, Yandong [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Beijing 100124, Peoples R China
关键词
Image -text retrieval; Cross -modal retrieval; Image captioning; Graph convolution;
D O I
10.1016/j.patrec.2022.08.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval has drawn much attention in recent years, where similarity measure between im-age and text plays an important role. Most existing works focus on learning global coarse-grained or local fine-grained features for similarity computation. However, the large domain gap between different modalities is often neglected, which makes it difficult to match the images and texts effectively. In order to deal with this problem, we propose to use auxiliary information to release the domain gap, where the image captions are generated. Then, a Caption-Assisted Graph Neural Network(CGNN) is designed to learn the structured relationships among images, captions, and texts. Since the captions and the texts are from the same domain, the domain gap between images and texts can be effectively released. With the help of caption information, our model achieves excellent performance on two cross-modal retrieval datasets, Flickr30K and MS-COCO, which shows the effectiveness of our framework.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:137 / 142
页数:6
相关论文
共 50 条
  • [1] Image-text interaction graph neural network for image-text sentiment analysis
    Wenxiong Liao
    Bi Zeng
    Jianqi Liu
    Pengfei Wei
    Jiongkun Fang
    Applied Intelligence, 2022, 52 : 11184 - 11198
  • [2] Image-text interaction graph neural network for image-text sentiment analysis
    Liao, Wenxiong
    Zeng, Bi
    Liu, Jianqi
    Wei, Pengfei
    Fang, Jiongkun
    APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
  • [3] Scene Graph based Fusion Network for Image-Text Retrieval
    Wang, Guoliang
    Shang, Yanlei
    Chen, Yong
    Zhen, Chaoqi
    Cheng, Dequan
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 138 - 143
  • [4] HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
    Guo, Jie
    Wang, Meiting
    Zhou, Yan
    Song, Bin
    Chi, Yuhao
    Fan, Wei
    Chang, Jianglong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9189 - 9202
  • [5] Cross Attention Graph Matching Network for Image-Text Retrieval
    Yang, Xiaoyu
    Xie, Hao
    Mao, Junyi
    Wang, Zhiguo
    Yin, Guangqiang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 274 - 286
  • [6] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [7] Flexible graph-based attention and pooling network for image-text retrieval
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57895 - 57912
  • [8] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [9] Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval
    Zheng, Juncheng
    Liang, Meiyu
    Yu, Yang
    Du, Junping
    Xue, Zhe
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 97 - 100
  • [10] RELATION-GUIDED NETWORK FOR IMAGE-TEXT RETRIEVAL
    Yang, Yulou
    Shen, Hao
    Yang, Ming
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1856 - 1860