Efficient video coding based on audio-visual focus of attention

被引：26

作者：

Lee, Jong-Seok ^{[1
]}

De Simone, Francesca ^{[1
]}

Ebrahimi, Touradj ^{[1
]}

机构：

[1] Ecole Polytech Fed Lausanne, Inst Elect Engn, Multimedia Signal Proc Grp MMSPG, CH-1015 Lausanne, Switzerland

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2011年 / 22卷 / 08期

关键词：

Video coding; Audio-visual focus of attention; Quality of experience; Audio-visual source localization; H.264/AVC; Flexible macroblock ordering (FMO); Canonical correlation analysis; Subjective quality assessment; MULTIMODAL SPEAKER DETECTION; SPATIAL ATTENTION; TRACKING; LINKS; INTEGRATION; FOVEATION;

D O I：

10.1016/j.jvcir.2010.11.002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes an efficient video coding method using audio-visual focus of attention, which is based on the observation that sound-emitting regions in an audio-visual sequence draw viewers' attention. First, an audio-visual source localization algorithm is presented, where the sound source is identified by using the correlation between the sound signal and the visual motion information. The localization result is then used to encode different regions in the scene with different quality in such a way that regions close to the source are encoded with higher quality than those far from the source. This is implemented in the framework of H.264/AVC by assigning different quantization parameters for different regions. Through experiments with both standard and high definition sequences, it is demonstrated that the proposed method can yield considerable coding gains over the constant quantization mode of H.264/AVC without noticeable degradation of perceived quality. (C) 2010 Elsevier Inc. All rights reserved.

引用

页码：704 / 711

页数：8

共 50 条

[31] Audio-visual content analysis for content-based video indexing
Tsekeridou, Sofia
Pitas, Ioannis
International Conference on Multimedia Computing and Systems -Proceedings, 1999, 1 : 667 - 672
[32] Content-based video parsing and indexing based on audio-visual interaction
Tsekeridou, S
Pitas, I
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (04) : 522 - 535
[33] Multi-Attention Audio-Visual Fusion Network for Audio Spatialization
Zhang, Wen
Shao, Jie
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 394 - 401
[34] Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Liu, Xubo
Huang, Qiushi
Mei, Xinhao
Liu, Haohe
Kong, Qiuqiang
Sun, Jianyuan
Li, Shengchen
Ko, Tom
Zhang, Yu
Tang, Lilian H.
Plumbley, Mark D.
Kilic, Volkan
Wang, Wenwu
INTERSPEECH 2023, 2023, : 2838 - 2842
[35] Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
Chao, Fang-Yi
Ozcinar, Cagri
Zhang, Lu
Hamidouche, Wassim
Deforges, Olivier
Smolic, Aljosa
2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 355 - 358
[36] Perceptual Quality of Audio-Visual Content with Common Video and Audio Degradations
Becerra Martinez, Helard
Hines, Andrew
Farias, Mylene C. Q.
APPLIED SCIENCES-BASEL, 2021, 11 (13):
[37] An Audio-Visual Attention System for Online Association Learning
Heckmann, Martin
Brandl, Holger
Domont, Xavier
Bolder, Bram
Joublin, Frank
Goerick, Christian
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2127 - 2130
[38] Efficient audio-visual information fusion using encoding pace synchronization for Audio-Visual Speech Separation
Xu, Xinmeng
Tu, Weiping
Yang, Yuhong
INFORMATION FUSION, 2025, 115
[39] DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM
Li, Chenda
Qian, Yanmin
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7314 - 7318
[40] Self-Supervised Video Representation and Temporally Adaptive Attention for Audio-Visual Event Localization
Ran, Yue
Tang, Hongying
Li, Baoqing
Wang, Guohui
APPLIED SCIENCES-BASEL, 2022, 12 (24):

← 1 2 3 4 5 →