Incorporating Audio Signals into Constructing a Visual Saliency Map

被引:0
|
作者
Nakajima, Jiro [1 ]
Sugimoto, Akihiro [2 ]
Kawamoto, Kazuhiko [1 ]
机构
[1] Chiba Univ, Chiba, Japan
[2] Natl Inst Informat, Tokyo, Japan
来源
关键词
gaze; visual attention; visual saliency; auditory saliency; audio signal; video; sound source feature; AUDITORY ATTENTION; MODEL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The saliency map has been proposed to identify regions that draw human visual attention. Differences of features from the surroundings are hierarchially computed for an image or an image sequence in multiple resolutions and they are fused in a fully bottom-up manner to obtain a saliency map. A video usually contains sounds, and not only visual stimuli but also auditory stimuli attract human attention. Nevertheless, most conventional methods discard auditory information and image information alone is used in computing a saliency map. This paper presents a method for constructing a visual saliency map by integrating image features with auditory features. We assume a single moving sound source in a video and introduce a sound source feature. Our method detects the sound source feature using the correlation between audio signals and sound source motion, and computes its importance in each frame in a video using an auditory saliency map. The importance is used to fuse the sound source feature with image features to construct a visual saliency map. Experiments using subjects demonstrate that a saliency map by our proposed method reflects human's visual attention more accurately than that by a conventional method.
引用
收藏
页码:468 / 480
页数:13
相关论文
共 50 条
  • [1] Audio-Visual Saliency Map: Overview, Basic Models and Hardware Implementation
    Ramenahalli, Sudarshan
    Mendat, Daniel R.
    Dura-Bernal, Salvador
    Culurciello, Eugenio
    Niebur, Ernst
    Andreou, Andreas
    2013 47TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2013,
  • [2] A saliency map in primary visual cortex
    Li, ZP
    TRENDS IN COGNITIVE SCIENCES, 2002, 6 (01) : 9 - 16
  • [3] An audio-visual saliency model for movie summarization
    Rapantzikos, Konstantinos
    Evangelopoulos, Georgios
    Maragos, Petros
    Avrithis, Yannis
    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 320 - 323
  • [4] Olfaction spontaneously highlights visual saliency map
    Chen, Kepu
    Zhou, Bin
    Chen, Shan
    He, Sheng
    Zhou, Wen
    PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2013, 280 (1768)
  • [5] The visual attention saliency map for movie retrospection
    Rogalska, Anna
    Napieralski, Piotr
    OPEN PHYSICS, 2018, 16 (01): : 188 - 192
  • [6] The visual saliency map is non-retinotopic
    Vergeer, M.
    Boi, M.
    Oegmen, H.
    Herzog, M. H.
    PERCEPTION, 2011, 40 : 130 - 130
  • [7] Unified Audio-Visual Saliency Model for Omnidirectional Videos With Spatial Audio
    Zhu, Dandan
    Zhang, Kaiwei
    Zhang, Nana
    Zhou, Qiangqiang
    Min, Xiongkuo
    Zhai, Guangtao
    Yang, Xiaokang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 764 - 775
  • [8] Does Audio help in deep Audio-Visual Saliency prediction models?
    Agrawal, Ritvik
    Jyoti, Shreyank
    Girmaji, Rohit
    Sivaprasad, Sarath
    Gandhi, Vineet
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 48 - 56
  • [9] Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
    Chao, Fang-Yi
    Ozcinar, Cagri
    Zhang, Lu
    Hamidouche, Wassim
    Deforges, Olivier
    Smolic, Aljosa
    2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 355 - 358
  • [10] Visual Saliency Detection guided by Neural Signals
    Palazzo, Simone
    Rundo, Francesco
    Battiato, Sebastiano
    Giordano, Daniela
    Spampinato, Concetto
    2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 525 - 531