Saliency-Guided Attention Network for Image-Sentence Matching

被引:83
|
作者
Ji, Zhong [1 ]
Wang, Haoran [1 ]
Han, Jungong [2 ]
Pang, Yanwei [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[2] Univ Warwick, WMG Data Sci, Coventry, W Midlands, England
基金
中国国家自然科学基金;
关键词
OBJECT DETECTION;
D O I
10.1109/ICCV.2019.00585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the task of matching image and sentence, where learning appropriate representations to bridge the semantic gap between image contents and language appears to be the main challenge. Unlike previous approaches that predominantly deploy symmetrical architecture to represent both modalities, we introduce a Saliency-guided Attention Network (SAN) that is characterized by building an asymmetrical link between vision and language to efficiently learn a fine-grained cross-modal correlation. The proposed SAN mainly includes three components: saliency detector, Saliency-weighted Visual Attention (SVA) module, and Saliency-guided Textual Attention (STA) module. Concretely, the saliency detector provides the visual saliency information to drive both two attention modules. Taking advantage of the saliency information, SVA is able to learn more discriminative visual features. By fusing the visual information from SVA and intra-modal information as a multi-modal guidance, STA affords us powerful textual representations that are synchronized with visual clues. Extensive experiments demonstrate SAN can improve the state-of-the-art results on the benchmark Flickr30K and MSCOCO datasets by a large margin.(1)
引用
收藏
页码:5753 / 5762
页数:10
相关论文
共 50 条
  • [21] Deep Convolutional Neural Network for Bidirectional Image-Sentence Mapping
    Yu, Tianyuan
    Bai, Liang
    Guo, Jinlin
    Yang, Zheng
    Xie, Yuxiang
    MULTIMEDIA MODELING, MMM 2017, PT II, 2017, 10133 : 136 - 147
  • [22] Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching
    Xu, Xing
    Wang, Yifan
    He, Yixuan
    Yang, Yang
    Hanjalic, Alan
    Shen, Heng Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (04)
  • [23] Enhancing by Saliency-guided Decolorization
    Ancuti, Codruta Orniana
    Ancuti, Cosmin
    Bekaert, Phillipe
    2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 257 - 264
  • [24] SiSL-Net: Saliency-guided self-supervised learning network for image classification
    Liu, Kun
    Meng, Rui
    Li, Longteng
    Mao, Jingkun
    Chen, Haiyong
    NEUROCOMPUTING, 2022, 510 : 193 - 202
  • [25] SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater Robots
    RoboPI Group, Dept. of ECE, University of Florida, FL, United States
    不详
    Robot. Sci. Syst., 1600,
  • [26] SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater Robots
    Islam, Md Jahidul
    Wang, Ruobing
    Sattar, Junaed
    ROBOTICS: SCIENCE AND SYSTEM XVIII, 2022,
  • [27] Multi-View Saliency-Guided Clustering for Image Cosegmentation
    Tao, Zhiqiang
    Liu, Hongfu
    Fu, Huazhu
    Fu, Yun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) : 4634 - 4645
  • [28] Saliency-Guided Transformer Network combined with Local Embedding for No-Reference Image Quality Assessment
    Zhu, Mengmeng
    Hou, Guanqun
    Chen, Xinjia
    Xie, Jiaxing
    Lu, Haixian
    Che, Jun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1953 - 1962
  • [29] Image Cosegmentation via Saliency-Guided Constrained Clustering with Cosine Similarity
    Tao, Zhiqiang
    Liu, Hongfu
    Fu, Huazhu
    Fu, Yun
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4285 - 4291
  • [30] Saliency-guided convolution neural network-transformer fusion network for no-reference image quality assessment
    Wu, Lipeng
    Cui, Ziguan
    Gan, Zongliang
    Tang, Guijin
    Liu, Feng
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)