Saliency-Guided Attention Network for Image-Sentence Matching

被引：83

作者：

Ji, Zhong ^{[1
]}

Wang, Haoran ^{[1
]}

Han, Jungong ^{[2
]}

Pang, Yanwei ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

[2] Univ Warwick, WMG Data Sci, Coventry, W Midlands, England

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

基金：

中国国家自然科学基金;

关键词：

OBJECT DETECTION;

D O I：

10.1109/ICCV.2019.00585

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper studies the task of matching image and sentence, where learning appropriate representations to bridge the semantic gap between image contents and language appears to be the main challenge. Unlike previous approaches that predominantly deploy symmetrical architecture to represent both modalities, we introduce a Saliency-guided Attention Network (SAN) that is characterized by building an asymmetrical link between vision and language to efficiently learn a fine-grained cross-modal correlation. The proposed SAN mainly includes three components: saliency detector, Saliency-weighted Visual Attention (SVA) module, and Saliency-guided Textual Attention (STA) module. Concretely, the saliency detector provides the visual saliency information to drive both two attention modules. Taking advantage of the saliency information, SVA is able to learn more discriminative visual features. By fusing the visual information from SVA and intra-modal information as a multi-modal guidance, STA affords us powerful textual representations that are synchronized with visual clues. Extensive experiments demonstrate SAN can improve the state-of-the-art results on the benchmark Flickr30K and MSCOCO datasets by a large margin.(1)

引用

页码：5753 / 5762

页数：10

共 50 条

[1] Decoupled Cross-Modal Phrase-Attention Network for Image-Sentence Matching
Shi, Zhangxiang
Zhang, Tianzhu
Wei, Xi
Wu, Feng
Zhang, Yongdong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1326 - 1337
[2] Saliency-guided Pairwise Matching
Huang, Shao
Wang, Weiqiang
PATTERN RECOGNITION LETTERS, 2017, 97 : 37 - 43
[3] Saliency-Guided Image Translation
Jiang, Lai
Xu, Mai
Wang, Xiaofei
Sigal, Leonid
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16504 - 16513
[4] Saliency-guided image translation
Jiang, Lai
Dai, Ning
Xu, Mai
Deng, Xin
Li, Shengxi
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2023, 49 (10): : 2689 - 2698
[5] SGSR: A SALIENCY-GUIDED IMAGE SUPER-RESOLUTION NETWORK
Kim, Dayeon
Kim, Munchurl
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 980 - 984
[6] GazeFusion: Saliency-Guided Image Generation
Zhang, Yunxiang
Wu, Nan
Lin, Connor Z.
Wetzstein, Gordon
Sun, Qi
ACM TRANSACTIONS ON APPLIED PERCEPTION, 2024, 21 (04)
[7] SALIENCY-GUIDED IMAGE STYLE TRANSFER
Liu, Xiuwen
Liu, Zhi
Zhou, Xiaofei
Chen, Minyu
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 66 - 71
[8] Dynamic Pruning of Regions for Image-Sentence Matching
Wu, Jie
Liu, Weifeng
Wang, Leiquan
Shen, Xiuxuan
Wei, Yiwei
Wu, Chunlei
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
[9] Partial-Duplicate Image Retrieval via Saliency-Guided Visual Matching
Li, Liang
Jiang, Shuqiang
Zha, Zheng-Jun
Wu, Zhipeng
Huang, Qingming
IEEE MULTIMEDIA, 2013, 20 (03) : 13 - 23
[10] Saliency-Guided Lighting
Lee, Chang Ha
Kim, Youngmin
Varshney, Amitabh
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (02): : 369 - 373

← 1 2 3 4 5 →