CGNN: Caption-assisted graph neural network for image-text retrieval

被引：3

作者：

Hu, Yongli ^{[1
]}

Zhang, Hanfu ^{[1
]}

Jiang, Huajie ^{[1
,2
]}

Bi, Yandong ^{[1
]}

Yin, Baocai ^{[1
]}

机构：

[1] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China

[2] Beijing Univ Technol, Beijing 100124, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2022年 / 161卷

关键词：

Image -text retrieval; Cross -modal retrieval; Image captioning; Graph convolution;

D O I：

10.1016/j.patrec.2022.08.002

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image-text retrieval has drawn much attention in recent years, where similarity measure between im-age and text plays an important role. Most existing works focus on learning global coarse-grained or local fine-grained features for similarity computation. However, the large domain gap between different modalities is often neglected, which makes it difficult to match the images and texts effectively. In order to deal with this problem, we propose to use auxiliary information to release the domain gap, where the image captions are generated. Then, a Caption-Assisted Graph Neural Network(CGNN) is designed to learn the structured relationships among images, captions, and texts. Since the captions and the texts are from the same domain, the domain gap between images and texts can be effectively released. With the help of caption information, our model achieves excellent performance on two cross-modal retrieval datasets, Flickr30K and MS-COCO, which shows the effectiveness of our framework.(c) 2022 Elsevier B.V. All rights reserved.

引用

页码：137 / 142

页数：6

共 50 条

[31] Dynamic Contrastive Distillation for Image-Text Retrieval
Rao, Jun
Ding, Liang
Qi, Shuhan
Fang, Meng
Liu, Yang
Shen, Li
Tao, Dacheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8383 - 8395
[32] Semantic Completion and Filtration for Image-Text Retrieval
Yang, Song
Li, Qiang
Li, Wenhui
Li, Xuan-Ya
Jin, Ran
Lv, Bo
Wang, Rui
Liu, Anan
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
[33] Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval
Qin, Xue-Yang
Li, Li-Shuang
Tang, Jing-Yao
Hao, Fei
Ge, Mei-Ling
Pang, Guang-Yao
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 811 - 826
[34] CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval
Wen, Xin
Han, Zhizhong
Liu, Yu-Shen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2427 - 2437
[35] Image-text bidirectional learning network based cross-modal retrieval
Li, Zhuoyi
Lu, Huibin
Fu, Hao
Gu, Guanghua
NEUROCOMPUTING, 2022, 483 : 148 - 159
[36] Multi-Layer Probabilistic Association Reasoning Network for Image-Text Retrieval
Li, Wenrui
Xiong, Ruiqin
Fan, Xiaopeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9706 - 9717
[37] Global-aware Fragment Representation Aggregation Network for image-text retrieval
Wang, Di
Tian, Jiabo
Liang, Xiao
Tian, Yumin
He, Lihuo
PATTERN RECOGNITION, 2025, 159
[38] Learning Aligned Image-Text Representations Using Graph Attentive Relational Network
Jing, Ya
Wang, Wei
Wang, Liang
Tan, Tieniu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1840 - 1852
[39] Entity Semantic Feature Fusion Network for Remote Sensing Image-Text Retrieval
Shui, Jianan
Ding, Shuaipeng
Li, Mingyong
Ma, Yan
WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 130 - 145
[40] Multi-scale motivated neural network for image-text matching
Qin, Xueyang
Li, Lishuang
Pang, Guangyao
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4383 - 4407

← 1 2 3 4 5 →