Efficient video coding based on audio-visual focus of attention

被引:26
|
作者
Lee, Jong-Seok [1 ]
De Simone, Francesca [1 ]
Ebrahimi, Touradj [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Inst Elect Engn, Multimedia Signal Proc Grp MMSPG, CH-1015 Lausanne, Switzerland
关键词
Video coding; Audio-visual focus of attention; Quality of experience; Audio-visual source localization; H.264/AVC; Flexible macroblock ordering (FMO); Canonical correlation analysis; Subjective quality assessment; MULTIMODAL SPEAKER DETECTION; SPATIAL ATTENTION; TRACKING; LINKS; INTEGRATION; FOVEATION;
D O I
10.1016/j.jvcir.2010.11.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes an efficient video coding method using audio-visual focus of attention, which is based on the observation that sound-emitting regions in an audio-visual sequence draw viewers' attention. First, an audio-visual source localization algorithm is presented, where the sound source is identified by using the correlation between the sound signal and the visual motion information. The localization result is then used to encode different regions in the scene with different quality in such a way that regions close to the source are encoded with higher quality than those far from the source. This is implemented in the framework of H.264/AVC by assigning different quantization parameters for different regions. Through experiments with both standard and high definition sequences, it is demonstrated that the proposed method can yield considerable coding gains over the constant quantization mode of H.264/AVC without noticeable degradation of perceived quality. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:704 / 711
页数:8
相关论文
共 50 条
  • [31] Audio-visual content analysis for content-based video indexing
    Tsekeridou, Sofia
    Pitas, Ioannis
    International Conference on Multimedia Computing and Systems -Proceedings, 1999, 1 : 667 - 672
  • [32] Content-based video parsing and indexing based on audio-visual interaction
    Tsekeridou, S
    Pitas, I
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (04) : 522 - 535
  • [33] Multi-Attention Audio-Visual Fusion Network for Audio Spatialization
    Zhang, Wen
    Shao, Jie
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 394 - 401
  • [34] Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
    Liu, Xubo
    Huang, Qiushi
    Mei, Xinhao
    Liu, Haohe
    Kong, Qiuqiang
    Sun, Jianyuan
    Li, Shengchen
    Ko, Tom
    Zhang, Yu
    Tang, Lilian H.
    Plumbley, Mark D.
    Kilic, Volkan
    Wang, Wenwu
    INTERSPEECH 2023, 2023, : 2838 - 2842
  • [35] Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
    Chao, Fang-Yi
    Ozcinar, Cagri
    Zhang, Lu
    Hamidouche, Wassim
    Deforges, Olivier
    Smolic, Aljosa
    2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 355 - 358
  • [36] Perceptual Quality of Audio-Visual Content with Common Video and Audio Degradations
    Becerra Martinez, Helard
    Hines, Andrew
    Farias, Mylene C. Q.
    APPLIED SCIENCES-BASEL, 2021, 11 (13):
  • [37] An Audio-Visual Attention System for Online Association Learning
    Heckmann, Martin
    Brandl, Holger
    Domont, Xavier
    Bolder, Bram
    Joublin, Frank
    Goerick, Christian
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2127 - 2130
  • [38] Efficient audio-visual information fusion using encoding pace synchronization for Audio-Visual Speech Separation
    Xu, Xinmeng
    Tu, Weiping
    Yang, Yuhong
    INFORMATION FUSION, 2025, 115
  • [39] DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM
    Li, Chenda
    Qian, Yanmin
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7314 - 7318
  • [40] Self-Supervised Video Representation and Temporally Adaptive Attention for Audio-Visual Event Localization
    Ran, Yue
    Tang, Hongying
    Li, Baoqing
    Wang, Guohui
    APPLIED SCIENCES-BASEL, 2022, 12 (24):