Incorporating Audio Signals into Constructing a Visual Saliency Map

被引:0
|
作者
Nakajima, Jiro [1 ]
Sugimoto, Akihiro [2 ]
Kawamoto, Kazuhiko [1 ]
机构
[1] Chiba Univ, Chiba, Japan
[2] Natl Inst Informat, Tokyo, Japan
来源
关键词
gaze; visual attention; visual saliency; auditory saliency; audio signal; video; sound source feature; AUDITORY ATTENTION; MODEL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The saliency map has been proposed to identify regions that draw human visual attention. Differences of features from the surroundings are hierarchially computed for an image or an image sequence in multiple resolutions and they are fused in a fully bottom-up manner to obtain a saliency map. A video usually contains sounds, and not only visual stimuli but also auditory stimuli attract human attention. Nevertheless, most conventional methods discard auditory information and image information alone is used in computing a saliency map. This paper presents a method for constructing a visual saliency map by integrating image features with auditory features. We assume a single moving sound source in a video and introduce a sound source feature. Our method detects the sound source feature using the correlation between audio signals and sound source motion, and computes its importance in each frame in a video using an auditory saliency map. The importance is used to fuse the sound source feature with image features to construct a visual saliency map. Experiments using subjects demonstrate that a saliency map by our proposed method reflects human's visual attention more accurately than that by a conventional method.
引用
收藏
页码:468 / 480
页数:13
相关论文
共 50 条
  • [11] ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
    Jain, Samyak
    Yarlagadda, Pradeep
    Jyoti, Shreyank
    Karthik, Shyamgopal
    Subramanian, Ramanathan
    Gandhi, Vineet
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3520 - 3527
  • [12] An Investigation into Incorporating Visual Information in Audio Processing
    Tekin, Ender
    Coughlan, James M.
    Simon, Helen J.
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP 2014, PT I, 2014, 8547 : 437 - 440
  • [13] Audio-visual saliency prediction with multisensory perception and integration
    Xie, Jiawei
    Liu, Zhi
    Li, Gongyang
    Song, Yingjie
    IMAGE AND VISION COMPUTING, 2024, 143
  • [14] A Novel Lightweight Audio-visual Saliency Model for Videos
    Zhu, Dandan
    Shao, Xuan
    Zhou, Qiangqiang
    Min, Xiongkuo
    Zhai, Guangtao
    Yang, Xiaokang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [15] Audio–visual collaborative representation learning for Dynamic Saliency Prediction
    Ning, Hailong
    Zhao, Bin
    Hu, Zhanxuan
    He, Lang
    Pei, Ercheng
    Knowledge-Based Systems, 2022, 256
  • [16] Deep Audio-Visual Saliency: Baseline Model and Data
    Tavakoli, Hamed R.
    Borji, Ali
    Kannala, Juho
    Rahtu, Esa
    ETRA 2020 SHORT PAPERS: ACM SYMPOSIUM ON EYE TRACKING RESEARCH & APPLICATIONS, 2020,
  • [17] Image Modification Based on a Visual Saliency Map for Guiding Visual Attention
    Takimoto, Hironori
    Kokui, Tatsuhiko
    Yamauchi, Hitoshi
    Kishihara, Mitsuyoshi
    Okubo, Kensuke
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (11): : 1967 - 1975
  • [18] Extraction of visual attention with gaze duration and saliency map
    Igarashi, Hiroshi
    Suzuki, Satoshi
    Sugita, Tetsuro
    Kurisu, Masamitsu
    Kakikura, Masayoshi
    PROCEEDINGS OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON CONTROL APPLICATIONS, VOLS 1-4, 2006, : 291 - +
  • [19] Effects selection technique via visual saliency map
    Suzuki N.
    Nakada Y.
    Journal of the Institute of Image Electronics Engineers of Japan, 2017, 46 (04): : 498 - 509
  • [20] Mobile Video Processing for Visual Saliency Map Determination
    Xu, Shilin
    Lin, Weisi
    Kuo, C. -C. Jay
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXI, 2008, 7073