Saliency-Guided Attention Network for Image-Sentence Matching

被引:83
|
作者
Ji, Zhong [1 ]
Wang, Haoran [1 ]
Han, Jungong [2 ]
Pang, Yanwei [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[2] Univ Warwick, WMG Data Sci, Coventry, W Midlands, England
基金
中国国家自然科学基金;
关键词
OBJECT DETECTION;
D O I
10.1109/ICCV.2019.00585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the task of matching image and sentence, where learning appropriate representations to bridge the semantic gap between image contents and language appears to be the main challenge. Unlike previous approaches that predominantly deploy symmetrical architecture to represent both modalities, we introduce a Saliency-guided Attention Network (SAN) that is characterized by building an asymmetrical link between vision and language to efficiently learn a fine-grained cross-modal correlation. The proposed SAN mainly includes three components: saliency detector, Saliency-weighted Visual Attention (SVA) module, and Saliency-guided Textual Attention (STA) module. Concretely, the saliency detector provides the visual saliency information to drive both two attention modules. Taking advantage of the saliency information, SVA is able to learn more discriminative visual features. By fusing the visual information from SVA and intra-modal information as a multi-modal guidance, STA affords us powerful textual representations that are synchronized with visual clues. Extensive experiments demonstrate SAN can improve the state-of-the-art results on the benchmark Flickr30K and MSCOCO datasets by a large margin.(1)
引用
收藏
页码:5753 / 5762
页数:10
相关论文
共 50 条
  • [1] Decoupled Cross-Modal Phrase-Attention Network for Image-Sentence Matching
    Shi, Zhangxiang
    Zhang, Tianzhu
    Wei, Xi
    Wu, Feng
    Zhang, Yongdong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1326 - 1337
  • [2] Saliency-guided Pairwise Matching
    Huang, Shao
    Wang, Weiqiang
    PATTERN RECOGNITION LETTERS, 2017, 97 : 37 - 43
  • [3] Saliency-Guided Image Translation
    Jiang, Lai
    Xu, Mai
    Wang, Xiaofei
    Sigal, Leonid
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16504 - 16513
  • [4] Saliency-guided image translation
    Jiang, Lai
    Dai, Ning
    Xu, Mai
    Deng, Xin
    Li, Shengxi
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2023, 49 (10): : 2689 - 2698
  • [5] SGSR: A SALIENCY-GUIDED IMAGE SUPER-RESOLUTION NETWORK
    Kim, Dayeon
    Kim, Munchurl
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 980 - 984
  • [6] GazeFusion: Saliency-Guided Image Generation
    Zhang, Yunxiang
    Wu, Nan
    Lin, Connor Z.
    Wetzstein, Gordon
    Sun, Qi
    ACM TRANSACTIONS ON APPLIED PERCEPTION, 2024, 21 (04)
  • [7] SALIENCY-GUIDED IMAGE STYLE TRANSFER
    Liu, Xiuwen
    Liu, Zhi
    Zhou, Xiaofei
    Chen, Minyu
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 66 - 71
  • [8] Dynamic Pruning of Regions for Image-Sentence Matching
    Wu, Jie
    Liu, Weifeng
    Wang, Leiquan
    Shen, Xiuxuan
    Wei, Yiwei
    Wu, Chunlei
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
  • [9] Partial-Duplicate Image Retrieval via Saliency-Guided Visual Matching
    Li, Liang
    Jiang, Shuqiang
    Zha, Zheng-Jun
    Wu, Zhipeng
    Huang, Qingming
    IEEE MULTIMEDIA, 2013, 20 (03) : 13 - 23
  • [10] Saliency-Guided Lighting
    Lee, Chang Ha
    Kim, Youngmin
    Varshney, Amitabh
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (02): : 369 - 373