Saliency-Guided Attention Network for Image-Sentence Matching

被引:83
|
作者
Ji, Zhong [1 ]
Wang, Haoran [1 ]
Han, Jungong [2 ]
Pang, Yanwei [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[2] Univ Warwick, WMG Data Sci, Coventry, W Midlands, England
基金
中国国家自然科学基金;
关键词
OBJECT DETECTION;
D O I
10.1109/ICCV.2019.00585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the task of matching image and sentence, where learning appropriate representations to bridge the semantic gap between image contents and language appears to be the main challenge. Unlike previous approaches that predominantly deploy symmetrical architecture to represent both modalities, we introduce a Saliency-guided Attention Network (SAN) that is characterized by building an asymmetrical link between vision and language to efficiently learn a fine-grained cross-modal correlation. The proposed SAN mainly includes three components: saliency detector, Saliency-weighted Visual Attention (SVA) module, and Saliency-guided Textual Attention (STA) module. Concretely, the saliency detector provides the visual saliency information to drive both two attention modules. Taking advantage of the saliency information, SVA is able to learn more discriminative visual features. By fusing the visual information from SVA and intra-modal information as a multi-modal guidance, STA affords us powerful textual representations that are synchronized with visual clues. Extensive experiments demonstrate SAN can improve the state-of-the-art results on the benchmark Flickr30K and MSCOCO datasets by a large margin.(1)
引用
收藏
页码:5753 / 5762
页数:10
相关论文
共 50 条
  • [41] PSYCHOPHYSIOLOGICAL STUDIES ON PARADIGM OF IMAGE-SENTENCE COMPARISON
    KLIX, F
    REBENTISCH, E
    ZEITSCHRIFT FUR PSYCHOLOGIE, 1976, 184 (03): : 445 - 449
  • [42] UAV Image Haze Removal Based on Saliency-Guided Parallel Learning Mechanism
    Zheng, Ruohui
    Zhang, Libao
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [43] Cluster-Based Saliency-Guided Content-Aware Image Retargeting
    Li-Wei Kang
    Ching-Yu Tseng
    Chao-Long Jheng
    Ming-Fang Weng
    Chao-Yung Hsu
    Journal of Electronic Science and Technology, 2017, 15 (02) : 141 - 146
  • [44] Multilevel saliency-guided self-supervised learning for image anomaly detection
    Qin, Jianjian
    Gu, Chunzhi
    Yu, Jun
    Zhang, Chao
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 6339 - 6351
  • [45] Saliency-Guided Nonsubsampled Shearlet Transform for Multisource Remote Sensing Image Fusion
    Li, Liangliang
    Ma, Hongbing
    SENSORS, 2021, 21 (05) : 1 - 14
  • [46] SGDNet: An End-to-End Saliency-Guided Deep Neural Network for No-Reference Image Quality Assessment
    Yang, Sheng
    Jiang, Qiuping
    Lin, Weisi
    Wang, Yongtao
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1383 - 1391
  • [47] Boosting Few-shot visual recognition via saliency-guided complementary attention
    Zhao, Linglan
    Liu, Ge
    Guo, Dashan
    Li, Wei
    Fang, Xiangzhong
    NEUROCOMPUTING, 2022, 507 : 412 - 427
  • [48] SageMix: Saliency-Guided Mixup for Point Clouds
    Lee, Sanghyeok
    Jeon, Minkyu
    Kim, Injae
    Xiong, Yunyang
    Kim, Hyunwoo J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [49] Saliency-Guided Complexity Control for HEVC Decoding
    Yang, Ren
    Xu, Mai
    Wang, Zulin
    Duan, Yiping
    Tao, Xiaoming
    IEEE TRANSACTIONS ON BROADCASTING, 2018, 64 (04) : 865 - 882
  • [50] Saliency-Guided Color Transfer between Images
    Xia, Jiazhi
    ADVANCES IN VISUAL COMPUTING, ISVC 2013, PT I, 2013, 8033 : 468 - 475