Context-aware relation enhancement and similarity reasoning for image-text retrieval

被引：0

作者：

Cui, Zheng ^{[1
]}

Hu, Yongli ^{[1
,2
]}

Sun, Yanfeng ^{[1
]}

Yin, Baocai ^{[1
]}

机构：

[1] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Fac Informat Technol, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China

[2] Beijing Univ Technol, 100 Pingleyuan, Beijing, Peoples R China

来源：

IET COMPUTER VISION | 2024年 / 18卷 / 05期

基金：

国家重点研发计划;

关键词：

image retrieval; multimedia systems;

D O I：

10.1049/cvi2.12270

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image-text retrieval is a fundamental yet challenging task, which aims to bridge a semantic gap between heterogeneous data to achieve precise measurements of semantic similarity. The technique of fine-grained alignment between cross-modal features plays a key role in various successful methods that have been proposed. Nevertheless, existing methods cannot effectively utilise intra-modal information to enhance feature representation and lack powerful similarity reasoning to get a precise similarity score. Intending to tackle these issues, a context-aware Relation Enhancement and Similarity Reasoning model, called RESR, is proposed, which conducts both intra-modal relation enhancement and inter-modal similarity reasoning while considering the global-context information. For intra-modal relation enhancement, a novel context-aware graph convolutional network is introduced to enhance local feature representations by utilising relation and global-context information. For inter-modal similarity reasoning, local and global similarity features are exploited by the bidirectional alignment of image and text, and the similarity reasoning is implemented among multi-granularity similarity features. Finally, refined local and global similarity features are adaptively fused to get a precise similarity score. The experimental results show that our effective model outperforms some state-of-the-art approaches, achieving average improvements of 2.5% and 6.3% in R@sum on the Flickr30K and MS-COCO dataset. A novel context-aware relation enhancement and similarity reasoning model is proposed to achieve precise image-text retrieval, which conducts both intra-modal relation enhancement and inter-modal similarity reasoning while considering the global-context information. image

引用

页码：652 / 665

页数：14

共 50 条

[1] Context-Aware Attention Network for Image-Text Retrieval
Zhang, Qi
Lei, Zhen
Zhang, Zhaoxiang
Li, Stan Z.
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3533 - 3542
[2] Action-Aware Embedding Enhancement for Image-Text Retrieval
Li, Jiangtong
Niu, Li
Zhang, Liqing
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1323 - 1331
[3] Similarity Reasoning and Filtration for Image-Text Matching
Diao, Haiwen
Zhang, Ying
Ma, Lin
Lu, Huchuan
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226
[4] Global Relation-Aware Attention Network for Image-Text Retrieval
Cao, Jie
Qian, Shengsheng
Zhang, Huaiwen
Fang, Quan
Xu, Changsheng
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 19 - 28
[5] Multi-view and region reasoning semantic enhancement for image-text retrieval
Cheng, Wengang
Han, Ziyi
He, Di
Wu, Lifang
MULTIMEDIA SYSTEMS, 2024, 30 (04)
[6] Context-Aware Multi-View Summarization Network for Image-Text Matching
Qu, Leigang
Liu, Meng
Cao, Da
Nie, Liqiang
Tian, Qi
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1047 - 1055
[7] Remote Sensing Image-Text Retrieval With Implicit-Explicit Relation Reasoning
Yang, Lingling
Zhou, Tongqing
Ma, Wentao
Du, Mengze
Liu, Lu
Li, Feng
Zhao, Shan
Wang, Yuwei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[8] Transformer Reasoning Network for Image-Text Matching and Retrieval
Messina, Nicola
Falchi, Fabrizio
Esuli, Andrea
Amato, Giuseppe
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229
[9] Dual-Level Representation Enhancement on Characteristic and Context for Image-Text Retrieval
Yang, Song
Li, Qiang
Li, Wenhui
Li, Xuanya
Liu, An-An
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 8037 - 8050
[10] Cross-modal information balance-aware reasoning network for image-text retrieval
Qin, Xueyang
Li, Lishuang
Hao, Fei
Pang, Guangyao
Wang, Zehao
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120

← 1 2 3 4 5 →