Audiovisual Event Detection Towards Scene Understanding

被引:0
|
作者
Canton-Ferrer, C. [1 ]
Butko, T. [1 ]
Segura, C. [1 ]
Giro, X. [1 ]
Nadeu, C. [1 ]
Hernando, J. [1 ]
Casas, J. R. [1 ]
机构
[1] Tech Univ Catalonia, Barcelona, Spain
来源
2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2 | 2009年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multi-person tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.
引用
收藏
页码:840 / 847
页数:8
相关论文
共 50 条
  • [21] Towards Scene Understanding for Autonomous Operations on Airport Aprons
    Steininger, Daniel
    Kriegler, Andreas
    Pointner, Wolfgang
    Widhalm, Verena
    Simon, Julia
    Zendel, Oliver
    COMPUTER VISION - ACCV 2022 WORKSHOPS, 2023, 13848 : 153 - 169
  • [22] Towards Efficient Scene Understanding via Squeeze Reasoning
    Li, Xiangtai
    Li, Xia
    You, Ansheng
    Zhang, Li
    Cheng, Guangliang
    Yang, Kuiyuan
    Tong, Yunhai
    Lin, Zhouchen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7050 - 7063
  • [23] Towards Chl-a Bloom Understanding by EM-based Unsupervised Event Detection
    Poisson Caillault, Emilie
    Lefebvre, Alain
    OCEANS 2017 - ABERDEEN, 2017,
  • [24] Towards joint sound scene and polyphonic sound event recognition
    Bear, Helen L.
    Nolasco, Ines
    Benetos, Emmanouil
    INTERSPEECH 2019, 2019, : 4594 - 4598
  • [25] A DATABASE AND CHALLENGE FOR ACOUSTIC SCENE CLASSIFICATION AND EVENT DETECTION
    Giannoulis, Dimitrios
    Stowell, Dan
    Benetos, Emmanouil
    Rossignol, Mathias
    Lagrange, Mathieu
    Plumbley, Mark D.
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [26] Scene-based event detection for baseball videos
    Lien, Cheng-Chang
    Chiang, Chiu-Lung
    Lee, Chang-Hsing
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2007, 18 (01) : 1 - 14
  • [27] Towards understanding a protein surface recognition event
    Martinell, Marc
    Salvatella, Xavier
    Vilaseca, Marta
    Gairi, Margarida
    Giralt, Ernest
    Peptides 2004, Proceedings: BRIDGES BETWEEN DISCIPLINES, 2005, : 633 - 634
  • [28] SCENE-DEPENDENT ACOUSTIC EVENT DETECTION WITH SCENE CONDITIONING AND FAKE-SCENE-CONDITIONED LOSS
    Komatsu, Tatsuya
    Imoto, Keisuke
    Togami, Masahito
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 646 - 650
  • [29] Context-based environmental audio event recognition for scene understanding
    Tong Lu
    Gongyou Wang
    Feng Su
    Multimedia Systems, 2015, 21 : 507 - 524
  • [30] Context-based environmental audio event recognition for scene understanding
    Lu, Tong
    Wang, Gongyou
    Su, Feng
    MULTIMEDIA SYSTEMS, 2015, 21 (05) : 507 - 524