Audiovisual Event Detection Towards Scene Understanding

被引:0
|
作者
Canton-Ferrer, C. [1 ]
Butko, T. [1 ]
Segura, C. [1 ]
Giro, X. [1 ]
Nadeu, C. [1 ]
Hernando, J. [1 ]
Casas, J. R. [1 ]
机构
[1] Tech Univ Catalonia, Barcelona, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multi-person tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.
引用
收藏
页码:840 / 847
页数:8
相关论文
共 50 条
  • [1] Contextual Text Block Detection Towards Scene Text Understanding
    Xue, Chuhui
    Huang, Jiaxing
    Zhang, Wenqing
    Lu, Shijian
    Wang, Changhu
    Bai, Song
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 374 - 391
  • [2] LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding
    Liang, Min
    Ma, Jia-Wei
    Zhu, Xiaobin
    Qin, Jingyan
    Yin, Xu-Cheng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15665 - 15674
  • [3] Semantic Audiovisual Features in Video Scene Detection
    Abdullah, Lili Nurliyana
    Noah, Shahrul Azman Mohd
    Sembok, Tengku Mohd Tengku
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND ENGINEERING, PROCEEDINGS, 2009, : 745 - +
  • [4] Scene boundary detection by audiovisual contents analysis
    Baek, JS
    Lee, ST
    Baek, JH
    AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 530 - 539
  • [5] Towards the Idea of Agricultural Market Understanding for Automatic Event Detection
    Kliangkhlao, Mallika
    Limsiroratana, Somchai
    2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND COMPUTER APPLICATIONS (ICSCA 2019), 2019, : 81 - 86
  • [6] Towards Comprehensive Understanding of Event Detection and Video Summarization Approaches
    Kalaivani, P.
    Roomi, Mohamed Mansoor S.
    2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 61 - 66
  • [7] Robust scene boundary detection based on audiovisual information
    Lee, ST
    Baek, JS
    Baek, JH
    COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 729 - 734
  • [8] Towards Holistic Surgical Scene Understanding
    Valderrama, Natalia
    Puentes, Paola Ruiz
    Hernandez, Isabela
    Ayobi, Nicolas
    Verlyck, Mathilde
    Santander, Jessica
    Caicedo, Juan
    Fernandez, Nicolas
    Arbelaez, Pablo
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 442 - 452
  • [9] Investigating Multimodal Audiovisual Event Detection and Localization
    Vryzas, N.
    Kotsakis, R.
    Dimoulas, C. A.
    Kalliris, G.
    PROCEEDINGS OF AUDIO MOSTLY 2016 - A CONFERENCE ON INTERACTION WITH SOUND IN COOPERATION WITH ACM, 2016, : 97 - 104
  • [10] Towards In-context Scene Understanding
    Balazevic, Ivana
    Steiner, David
    Parthasarathy, Nikhil
    Arandjelovic, Relja
    Henaff, Olivier J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,