Saliency-Guided Attention Network for Image-Sentence Matching

被引:83
|
作者
Ji, Zhong [1 ]
Wang, Haoran [1 ]
Han, Jungong [2 ]
Pang, Yanwei [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[2] Univ Warwick, WMG Data Sci, Coventry, W Midlands, England
基金
中国国家自然科学基金;
关键词
OBJECT DETECTION;
D O I
10.1109/ICCV.2019.00585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the task of matching image and sentence, where learning appropriate representations to bridge the semantic gap between image contents and language appears to be the main challenge. Unlike previous approaches that predominantly deploy symmetrical architecture to represent both modalities, we introduce a Saliency-guided Attention Network (SAN) that is characterized by building an asymmetrical link between vision and language to efficiently learn a fine-grained cross-modal correlation. The proposed SAN mainly includes three components: saliency detector, Saliency-weighted Visual Attention (SVA) module, and Saliency-guided Textual Attention (STA) module. Concretely, the saliency detector provides the visual saliency information to drive both two attention modules. Taking advantage of the saliency information, SVA is able to learn more discriminative visual features. By fusing the visual information from SVA and intra-modal information as a multi-modal guidance, STA affords us powerful textual representations that are synchronized with visual clues. Extensive experiments demonstrate SAN can improve the state-of-the-art results on the benchmark Flickr30K and MSCOCO datasets by a large margin.(1)
引用
收藏
页码:5753 / 5762
页数:10
相关论文
共 50 条
  • [31] Multi-Modality Cross Attention Network for Image and Sentence Matching
    Wei, Xi
    Zhang, Tianzhu
    Li, Yan
    Zhang, Yongdong
    Wu, Feng
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10938 - 10947
  • [32] Saliency-Guided Local Full-Reference Image Quality Assessment
    Varga, Domonkos
    SIGNALS, 2022, 3 (03): : 483 - 496
  • [33] Saliency-guided compressive fluorescence microscopy
    Schwartz, Shimon
    Wong, Alexander
    Clausi, David A.
    2012 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2012, : 4365 - 4368
  • [34] Saliency-Guided Deep Neural Networks for SAR Image Change Detection
    Geng, Jie
    Ma, Xiaorui
    Zhou, Xiaojun
    Wang, Hongyu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (10): : 7365 - 7377
  • [35] Saliency-guided enhancement for volume visualization
    Kim, Youngmin
    Varshney, Amitabh
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2006, 12 (05) : 925 - 932
  • [36] Saliency-Guided Integration of Multiple Scans
    Song, Ran
    Liu, Yonghuai
    Martin, Ralph R.
    Rosin, Paul L.
    2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 1474 - 1481
  • [37] Blind 360-degree image quality assessment via saliency-guided convolution neural network
    Qiu, Miaomiao
    Shao, Feng
    OPTIK, 2021, 240
  • [38] SGDA: A Saliency-Guided Domain Adaptation Network for Nighttime Semantic Segmentation
    Duan, Yijia
    Tu, Jingzheng
    Chen, Cailian
    2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
  • [39] Saliency-Guided Region Proposal Network for CNN Based Object Detection
    Fattal, Ann-Katrin
    Karg, Michelle
    Scharfenberger, Christian
    Adamy, Juergen
    2017 IEEE 20TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2017,
  • [40] Saliency-Guided Consistent Color Harmonization
    Baveye, Yoann
    Urban, Fabrice
    Chamaret, Christel
    Demoulin, Vincent
    Hellier, Pierre
    COMPUTATIONAL COLOR IMAGING, CCIW 2013, 2013, 7786 : 105 - 118