Incorporating Audio Signals into Constructing a Visual Saliency Map

被引：0

作者：

Nakajima, Jiro ^{[1
]}

Sugimoto, Akihiro ^{[2
]}

Kawamoto, Kazuhiko ^{[1
]}

机构：

[1] Chiba Univ, Chiba, Japan

[2] Natl Inst Informat, Tokyo, Japan

来源：

IMAGE AND VIDEO TECHNOLOGY, PSIVT 2013 | 2014年 / 8333卷

关键词：

gaze; visual attention; visual saliency; auditory saliency; audio signal; video; sound source feature; AUDITORY ATTENTION; MODEL;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The saliency map has been proposed to identify regions that draw human visual attention. Differences of features from the surroundings are hierarchially computed for an image or an image sequence in multiple resolutions and they are fused in a fully bottom-up manner to obtain a saliency map. A video usually contains sounds, and not only visual stimuli but also auditory stimuli attract human attention. Nevertheless, most conventional methods discard auditory information and image information alone is used in computing a saliency map. This paper presents a method for constructing a visual saliency map by integrating image features with auditory features. We assume a single moving sound source in a video and introduce a sound source feature. Our method detects the sound source feature using the correlation between audio signals and sound source motion, and computes its importance in each frame in a video using an auditory saliency map. The importance is used to fuse the sound source feature with image features to construct a visual saliency map. Experiments using subjects demonstrate that a saliency map by our proposed method reflects human's visual attention more accurately than that by a conventional method.

引用

页码：468 / 480

页数：13

共 50 条

[21] Constructing audio-visual representations of consumer archetypes
Caldwell, Marylouise
Henry, Paul
Alman, Ariell
QUALITATIVE MARKET RESEARCH, 2010, 13 (01): : 84 - +
[22] Lightweight single image deraining algorithm incorporating visual saliency
Hu, Mingdi
Yang, Jingbing
Ling, Nam
Liu, Yuhong
Fan, Jiulun
IET IMAGE PROCESSING, 2022, 16 (12) : 3190 - 3200
[23] Fusion of Visual and Audio Signals for Wildlife Surveillance
Ng, Cheng Hao
Connie, Tee
Choo, Kan Yeep
Goh, Michael Kah Ong
INTERNATIONAL JOURNAL OF TECHNOLOGY, 2022, 13 (06) : 1213 - 1221
[24] Effects Selection Technique for Improving Visual Attraction via Visual Saliency Map
Suzuki, Natsumi
Nakada, Yohei
2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1030 - 1037
[25] VIDEO EVENT DETECTION AND SUMMARIZATION USING AUDIO, VISUAL AND TEXT SALIENCY
Evangelopoulos, G.
Zlatintsi, A.
Skoumas, G.
Rapantzikos, K.
Potamianos, A.
Maragos, P.
Avrithis, Y.
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3553 - +
[26] Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
Qamar, Maryam
Qamar, Suleman
Muneeb, Muhammad
Bae, Sung-Ho
Rahman, Anis
IEEE ACCESS, 2023, 11 : 15460 - 15470
[27] DEEP AUDIO-VISUAL FUSION NEURAL NETWORK FOR SALIENCY ESTIMATION
Yao, Shunyu
Min, Xiongkuo
Zhai, Guangtao
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1604 - 1608
[28] Audio-Visual Temporal Saliency Modeling Validated by fMRI Data
Koutras, Petros
Panagiotaropoulou, Georgia
Tsiami, Antigoni
Maragos, Petros
PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2081 - 2091
[29] Audio-visual collaborative representation learning for Dynamic Saliency Prediction
Ning, Hailong
Zhao, Bin
Hu, Zhanxuan
He, Lang
Pei, Ercheng
KNOWLEDGE-BASED SYSTEMS, 2022, 256
[30] A Multimodal Saliency Model for Videos With High Audio-Visual Correspondence
Min, Xiongkuo
Zhai, Guangtao
Zhou, Jiantao
Zhang, Xiao-Ping
Yang, Xiaokang
Guan, Xinping
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3805 - 3819

← 1 2 3 4 5 →