Object category detection using audio-visual cues

被引:0
|
作者
Luo, Jie [1 ,2 ]
Caputo, Barbara [1 ,2 ]
Zweig, Alon [3 ]
Bach, Joerg-Hendrik [4 ]
Anemueller, Joern [4 ]
机构
[1] IDIAP Res Inst, Ctr Parc, CH-1920 Martigny, Switzerland
[2] Swiss Fed Inst Technol, Lausanne, Switzerland
[3] Hebrew Univ Jerusalem, Jerusalem, Israel
[4] Carl von Ossietzky Univ Oldenburg, Oldenburg, Germany
来源
关键词
object categorization; multimodal recognition; audio-visual fusion;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection.
引用
收藏
页码:539 / 548
页数:10
相关论文
共 50 条
  • [21] A NEW AUDIO-VISUAL CONTROL USING MESSAGE OBJECT TRANSMISSION
    HASE, T
    MATSUDA, M
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1994, 40 (04) : 920 - 926
  • [22] Self-supervised object detection from audio-visual correspondence
    Afouras, Triantafyllos
    Asano, Yuki M.
    Fagan, Francois
    Vedaldi, Andrea
    Metze, Florian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10565 - 10576
  • [23] Self-Supervised Moving Vehicle Detection From Audio-Visual Cues
    Zuern, Jannik
    Burgard, Wolfram
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (03) : 7415 - 7422
  • [24] Audio-visual event detection based on mining of semantic audio-visual labels
    Goh, KS
    Miyahara, K
    Radhakrishan, R
    Xiong, ZY
    Divakaran, A
    STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
  • [25] VOICE ACTIVITY DETECTION USING AUDIO-VISUAL INFORMATION
    Petsatodis, Theodoros
    Pnevmatikakis, Aristodemos
    Boukis, Christos
    2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 216 - +
  • [26] On Gaze Deployment to Audio-Visual Cues of Social Interactions
    Boccignone, Giuseppe
    Cuculo, Vittorio
    D'Amelio, Alessandro
    Grossi, Giuliano
    Lanzarotti, Raffaella
    IEEE ACCESS, 2020, 8 : 161630 - 161654
  • [27] Audio-visual speech perception without speech cues
    Saldana, HM
    Pisoni, DB
    Fellowes, JM
    Remez, RE
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2187 - 2190
  • [28] Object Permanence Through Audio-Visual Representations
    Bu, Fanjun
    Huang, Chien-Ming
    IEEE ACCESS, 2021, 9 : 131574 - 131582
  • [29] Cortical Plasticity of Audio-Visual Object Representations
    Naumer, Marcus J.
    Doehrmann, Oliver
    Mueller, Notger G.
    Muckli, Lars
    Kaiser, Jochen
    Hein, Grit
    CEREBRAL CORTEX, 2009, 19 (07) : 1641 - 1653
  • [30] Towards Audio-Visual Cues for Cloud Infrastructure Monitoring
    Bermbach, David
    Eberhardt, Jacob
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2016, : 218 - 219