Audio-visual event recognition in surveillance video sequences

被引:106
|
作者
Cristani, Marco [1 ]
Bicego, Manuele
Murino, Vittorio
机构
[1] Univ Verona, Dipartimento Informat, I-37134 Verona, Italy
[2] Univ Sassari, DEIR, I-07100 Sassari, Italy
关键词
audio-visual analysis; automated surveillance; event classification and clustering; multimodal background modelling and foreground detection; multimodality; scene analysis;
D O I
10.1109/TMM.2006.886263
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of the automated surveillance field, automatic scene analysis and understanding systems typically consider only visual information, whereas other modalities, such as audio, are typically disregarded. This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage and coupled with an audio BG/FG modelling scheme. These processes permit one to detect separate audio and visual patterns representing unusual unimodal events in a scene. The integration of audio and visual data is subsequently performed by exploiting the concept of synchrony between such events. The audio-visual (AV) association is carried out on-line and without need for training sequences, and is actually based on the computation of a characteristic feature called audio-video concurrence matrix, allowing one to detect and segment AV events, as well as to discriminate between them. Experimental tests involving classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by employing the single modalities and without considering the synchrony issue.
引用
收藏
页码:257 / 267
页数:11
相关论文
共 50 条
  • [41] Audio-visual Privacy Protection for Video Conference
    Venkatesh, M. Vijay
    Zhao, Jian
    Profitt, Larry
    Cheung, Sen-ching S.
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1574 - 1575
  • [42] Combining audio and video metrics to assess audio-visual quality
    Helard A. Becerra Martinez
    Mylène C. Q. Farias
    Multimedia Tools and Applications, 2018, 77 : 23993 - 24012
  • [43] VIDEO CODING BASED ON AUDIO-VISUAL ATTENTION
    Lee, Jong-Seok
    De Simone, Francesca
    Ebrahimi, Touradj
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 57 - 60
  • [44] Video concept detection by audio-visual grouplets
    Wei Jiang
    Alexander C. Loui
    International Journal of Multimedia Information Retrieval, 2012, 1 (4) : 223 - 238
  • [45] A audio-visual model for efficient video summarization
    El-Nagar, Gamal
    El-Sawy, Ahmed
    Rashad, Metwally
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
  • [46] An audio-visual approach to web video categorization
    Bogdan Emanuel Ionescu
    Klaus Seyerlehner
    Ionuţ Mironică
    Constantin Vertan
    Patrick Lambert
    Multimedia Tools and Applications, 2014, 70 : 1007 - 1032
  • [47] Video concept detection by audio-visual grouplets
    Jiang, Wei
    Loui, Alexander C.
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2012, 1 (04) : 223 - 238
  • [48] Audio-Visual Attention Networks for Emotion Recognition
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
  • [49] Audio-Visual Learning for Multimodal Emotion Recognition
    Fan, Siyu
    Jing, Jianan
    Wang, Chongwen
    SYMMETRY-BASEL, 2025, 17 (03):
  • [50] Audio-visual biometric recognition by vector quantization
    Das, Amitava
    Ghosh, Prasanta
    2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 166 - +