CGNN: Caption-assisted graph neural network for image-text retrieval

被引：3

作者：

Hu, Yongli ^{[1
]}

Zhang, Hanfu ^{[1
]}

Jiang, Huajie ^{[1
,2
]}

Bi, Yandong ^{[1
]}

Yin, Baocai ^{[1
]}

机构：

[1] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China

[2] Beijing Univ Technol, Beijing 100124, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2022年 / 161卷

关键词：

Image -text retrieval; Cross -modal retrieval; Image captioning; Graph convolution;

D O I：

10.1016/j.patrec.2022.08.002

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image-text retrieval has drawn much attention in recent years, where similarity measure between im-age and text plays an important role. Most existing works focus on learning global coarse-grained or local fine-grained features for similarity computation. However, the large domain gap between different modalities is often neglected, which makes it difficult to match the images and texts effectively. In order to deal with this problem, we propose to use auxiliary information to release the domain gap, where the image captions are generated. Then, a Caption-Assisted Graph Neural Network(CGNN) is designed to learn the structured relationships among images, captions, and texts. Since the captions and the texts are from the same domain, the domain gap between images and texts can be effectively released. With the help of caption information, our model achieves excellent performance on two cross-modal retrieval datasets, Flickr30K and MS-COCO, which shows the effectiveness of our framework.(c) 2022 Elsevier B.V. All rights reserved.

引用

页码：137 / 142

页数：6

共 50 条

[41] Multi-scale motivated neural network for image-text matching
Xueyang Qin
Lishuang Li
Guangyao Pang
Multimedia Tools and Applications, 2024, 83 : 4383 - 4407
[42] Fine-grained Feature Assisted Cross-modal Image-text Retrieval
Bu, Chaofei
Liu, Xueliang
Huang, Zhen
Su, Yuling
Tu, Junfeng
Hong, Richang
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI, 2025, 15041 : 306 - 320
[43] Dynamic Modality Interaction Modeling for Image-Text Retrieval
Qu, Leigang
Liu, Meng
Wu, Jianlong
Gao, Zan
Nie, Liqiang
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1104 - 1113
[44] External Knowledge Dynamic Modeling for Image-text Retrieval
Yang, Song
Li, Qiang
Li, Wenhui
Liu, Min
Li, Xuanya
Liu, Anan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5330 - 5338
[45] Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Wang, Sijin
Wang, Ruiping
Yao, Ziwei
Shan, Shiguang
Chen, Xilin
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1497 - 1506
[46] Asymmetric bi-encoder for image-text retrieval
Xiong, Wei
Liu, Haoliang
Mi, Siya
Zhang, Yu
MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3805 - 3818
[47] Multiview adaptive attention pooling for image-text retrieval
Ding, Yunlai
Yu, Jiaao
Lv, Qingxuan
Zhao, Haoran
Dong, Junyu
Li, Yuezun
KNOWLEDGE-BASED SYSTEMS, 2024, 291
[48] Simulation of cross-modal image-text retrieval algorithm under convolutional neural network structure and hash method
Yang, XianBen
Zhang, Wei
JOURNAL OF SUPERCOMPUTING, 2022, 78 (05): : 7106 - 7132
[49] Causal image-text retrieval embedded with consensus knowledge
Liang Y.
Liu X.
Ma Z.
Li Z.
Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2024, 46 (02): : 317 - 328
[50] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
Ji, Zhong
Wang, Haoran
Han, Jungong
Pang, Yanwei
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097

← 1 2 3 4 5 →