Incorporating Audio Signals into Constructing a Visual Saliency Map

被引:0
|
作者
Nakajima, Jiro [1 ]
Sugimoto, Akihiro [2 ]
Kawamoto, Kazuhiko [1 ]
机构
[1] Chiba Univ, Chiba, Japan
[2] Natl Inst Informat, Tokyo, Japan
来源
关键词
gaze; visual attention; visual saliency; auditory saliency; audio signal; video; sound source feature; AUDITORY ATTENTION; MODEL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The saliency map has been proposed to identify regions that draw human visual attention. Differences of features from the surroundings are hierarchially computed for an image or an image sequence in multiple resolutions and they are fused in a fully bottom-up manner to obtain a saliency map. A video usually contains sounds, and not only visual stimuli but also auditory stimuli attract human attention. Nevertheless, most conventional methods discard auditory information and image information alone is used in computing a saliency map. This paper presents a method for constructing a visual saliency map by integrating image features with auditory features. We assume a single moving sound source in a video and introduce a sound source feature. Our method detects the sound source feature using the correlation between audio signals and sound source motion, and computes its importance in each frame in a video using an auditory saliency map. The importance is used to fuse the sound source feature with image features to construct a visual saliency map. Experiments using subjects demonstrate that a saliency map by our proposed method reflects human's visual attention more accurately than that by a conventional method.
引用
收藏
页码:468 / 480
页数:13
相关论文
共 50 条
  • [21] Constructing audio-visual representations of consumer archetypes
    Caldwell, Marylouise
    Henry, Paul
    Alman, Ariell
    QUALITATIVE MARKET RESEARCH, 2010, 13 (01): : 84 - +
  • [22] Lightweight single image deraining algorithm incorporating visual saliency
    Hu, Mingdi
    Yang, Jingbing
    Ling, Nam
    Liu, Yuhong
    Fan, Jiulun
    IET IMAGE PROCESSING, 2022, 16 (12) : 3190 - 3200
  • [23] Fusion of Visual and Audio Signals for Wildlife Surveillance
    Ng, Cheng Hao
    Connie, Tee
    Choo, Kan Yeep
    Goh, Michael Kah Ong
    INTERNATIONAL JOURNAL OF TECHNOLOGY, 2022, 13 (06) : 1213 - 1221
  • [24] Effects Selection Technique for Improving Visual Attraction via Visual Saliency Map
    Suzuki, Natsumi
    Nakada, Yohei
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1030 - 1037
  • [25] VIDEO EVENT DETECTION AND SUMMARIZATION USING AUDIO, VISUAL AND TEXT SALIENCY
    Evangelopoulos, G.
    Zlatintsi, A.
    Skoumas, G.
    Rapantzikos, K.
    Potamianos, A.
    Maragos, P.
    Avrithis, Y.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3553 - +
  • [26] Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
    Qamar, Maryam
    Qamar, Suleman
    Muneeb, Muhammad
    Bae, Sung-Ho
    Rahman, Anis
    IEEE ACCESS, 2023, 11 : 15460 - 15470
  • [27] DEEP AUDIO-VISUAL FUSION NEURAL NETWORK FOR SALIENCY ESTIMATION
    Yao, Shunyu
    Min, Xiongkuo
    Zhai, Guangtao
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1604 - 1608
  • [28] Audio-Visual Temporal Saliency Modeling Validated by fMRI Data
    Koutras, Petros
    Panagiotaropoulou, Georgia
    Tsiami, Antigoni
    Maragos, Petros
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2081 - 2091
  • [29] Audio-visual collaborative representation learning for Dynamic Saliency Prediction
    Ning, Hailong
    Zhao, Bin
    Hu, Zhanxuan
    He, Lang
    Pei, Ercheng
    KNOWLEDGE-BASED SYSTEMS, 2022, 256
  • [30] A Multimodal Saliency Model for Videos With High Audio-Visual Correspondence
    Min, Xiongkuo
    Zhai, Guangtao
    Zhou, Jiantao
    Zhang, Xiao-Ping
    Yang, Xiaokang
    Guan, Xinping
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3805 - 3819