Efficient video coding based on audio-visual focus of attention

被引:26
|
作者
Lee, Jong-Seok [1 ]
De Simone, Francesca [1 ]
Ebrahimi, Touradj [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Inst Elect Engn, Multimedia Signal Proc Grp MMSPG, CH-1015 Lausanne, Switzerland
关键词
Video coding; Audio-visual focus of attention; Quality of experience; Audio-visual source localization; H.264/AVC; Flexible macroblock ordering (FMO); Canonical correlation analysis; Subjective quality assessment; MULTIMODAL SPEAKER DETECTION; SPATIAL ATTENTION; TRACKING; LINKS; INTEGRATION; FOVEATION;
D O I
10.1016/j.jvcir.2010.11.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes an efficient video coding method using audio-visual focus of attention, which is based on the observation that sound-emitting regions in an audio-visual sequence draw viewers' attention. First, an audio-visual source localization algorithm is presented, where the sound source is identified by using the correlation between the sound signal and the visual motion information. The localization result is then used to encode different regions in the scene with different quality in such a way that regions close to the source are encoded with higher quality than those far from the source. This is implemented in the framework of H.264/AVC by assigning different quantization parameters for different regions. Through experiments with both standard and high definition sequences, it is demonstrated that the proposed method can yield considerable coding gains over the constant quantization mode of H.264/AVC without noticeable degradation of perceived quality. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:704 / 711
页数:8
相关论文
共 50 条
  • [21] Combining audio and video metrics to assess audio-visual quality
    Helard A. Becerra Martinez
    Mylène C. Q. Farias
    Multimedia Tools and Applications, 2018, 77 : 23993 - 24012
  • [22] Indexing audio-visual sequences by joint audio and video processing
    Saraceno, C
    Leonardi, R
    VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
  • [23] Video concept detection by audio-visual grouplets
    Wei Jiang
    Alexander C. Loui
    International Journal of Multimedia Information Retrieval, 2012, 1 (4) : 223 - 238
  • [24] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75
  • [25] An audio-visual approach to web video categorization
    Bogdan Emanuel Ionescu
    Klaus Seyerlehner
    Ionuţ Mironică
    Constantin Vertan
    Patrick Lambert
    Multimedia Tools and Applications, 2014, 70 : 1007 - 1032
  • [26] Video concept detection by audio-visual grouplets
    Jiang, Wei
    Loui, Alexander C.
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2012, 1 (04) : 223 - 238
  • [27] Efficient Audio-Visual Speech Enhancement Using Deep U-Net With Early Fusion of Audio and Video Information and RNN Attention Blocks
    Hwang, Jung-Wook
    Park, Rae-Hong
    Park, Hyung-Min
    IEEE ACCESS, 2021, 9 : 137584 - 137598
  • [28] Audio-visual event detection based on mining of semantic audio-visual labels
    Goh, KS
    Miyahara, K
    Radhakrishan, R
    Xiong, ZY
    Divakaran, A
    STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
  • [29] Audio-Visual Speech Enhancement Based on Multiscale Features and Parallel Attention
    Jia, Shifan
    Zhang, Xinman
    Han, Weiqi
    2024 23RD INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA, INFOTEH, 2024,
  • [30] Audio-visual content analysis for content-based video indexing
    Tsekeridou, S
    Pitas, I
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 667 - 672