Efficient video coding based on audio-visual focus of attention

被引:26
|
作者
Lee, Jong-Seok [1 ]
De Simone, Francesca [1 ]
Ebrahimi, Touradj [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Inst Elect Engn, Multimedia Signal Proc Grp MMSPG, CH-1015 Lausanne, Switzerland
关键词
Video coding; Audio-visual focus of attention; Quality of experience; Audio-visual source localization; H.264/AVC; Flexible macroblock ordering (FMO); Canonical correlation analysis; Subjective quality assessment; MULTIMODAL SPEAKER DETECTION; SPATIAL ATTENTION; TRACKING; LINKS; INTEGRATION; FOVEATION;
D O I
10.1016/j.jvcir.2010.11.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes an efficient video coding method using audio-visual focus of attention, which is based on the observation that sound-emitting regions in an audio-visual sequence draw viewers' attention. First, an audio-visual source localization algorithm is presented, where the sound source is identified by using the correlation between the sound signal and the visual motion information. The localization result is then used to encode different regions in the scene with different quality in such a way that regions close to the source are encoded with higher quality than those far from the source. This is implemented in the framework of H.264/AVC by assigning different quantization parameters for different regions. Through experiments with both standard and high definition sequences, it is demonstrated that the proposed method can yield considerable coding gains over the constant quantization mode of H.264/AVC without noticeable degradation of perceived quality. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:704 / 711
页数:8
相关论文
共 50 条
  • [1] VIDEO CODING BASED ON AUDIO-VISUAL ATTENTION
    Lee, Jong-Seok
    De Simone, Francesca
    Ebrahimi, Touradj
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 57 - 60
  • [2] Subjective Quality Evaluation of Foveated Video Coding Using Audio-Visual Focus of Attention
    Lee, Jong-Seok
    De Simone, Francesca
    Ebrahimi, Touradj
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2011, 5 (07) : 1322 - 1331
  • [3] Attention-Based Audio-Visual Fusion for Video Summarization
    Fang, Yinghong
    Zhang, Junpeng
    Lu, Cewu
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 328 - 340
  • [4] A audio-visual model for efficient video summarization
    El-Nagar, Gamal
    El-Sawy, Ahmed
    Rashad, Metwally
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
  • [5] Efficient Video Coding in H.264/AVC by using Audio-Visual Information
    Lee, Jong-Seok
    Ebrahimi, Touradj
    2009 IEEE INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2009), 2009, : 402 - 407
  • [6] Audio-Visual Glance Network for Efficient Video Recognition
    Nugroho, Muhammad Adi
    Woo, Sangmin
    Lee, Sumin
    Kim, Changick
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10116 - 10125
  • [7] Audio-visual and EEG-based Attention Modeling for Extraction of Affective Video Content
    Mehmood, Irfan
    Sajjad, Muhammad
    Baik, Sung Wook
    Rho, Seungmin
    2015 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2015, : 17 - 18
  • [8] VIDEO AND EDUCATIONAL ROBOTICS: AN INNOVATIVE INTEGRATION OF AUDIO-VISUAL LANGUAGE AND CODING
    Denicolai, Lorenzo
    Grimaldi, Renato
    Palmieri, Silvia
    INTED2016: 10TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE, 2016, : 2617 - 2624
  • [9] Audio-visual speech processing and attention
    Sams, M
    PSYCHOPHYSIOLOGY, 2003, 40 : S5 - S6
  • [10] Audio-Visual Salieny Network with Audio Attention Module
    Cheng, Shuaiyang
    Gao, Xing
    Song, Liang
    Xiahou, Jianbing
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,