Object category detection using audio-visual cues

被引:0
|
作者
Luo, Jie [1 ,2 ]
Caputo, Barbara [1 ,2 ]
Zweig, Alon [3 ]
Bach, Joerg-Hendrik [4 ]
Anemueller, Joern [4 ]
机构
[1] IDIAP Res Inst, Ctr Parc, CH-1920 Martigny, Switzerland
[2] Swiss Fed Inst Technol, Lausanne, Switzerland
[3] Hebrew Univ Jerusalem, Jerusalem, Israel
[4] Carl von Ossietzky Univ Oldenburg, Oldenburg, Germany
来源
关键词
object categorization; multimodal recognition; audio-visual fusion;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection.
引用
收藏
页码:539 / 548
页数:10
相关论文
共 50 条
  • [11] Exploring the effectiveness of auditory, visual, and audio-visual sensory cues in a multiple object tracking environment
    Foecker, Julia
    Atkins, Polly
    Vantzos, Foivos-Christos
    Wilhelm, Maximilian
    Schenk, Thomas
    Meyerhoff, Hauke S.
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2022, 84 (05) : 1611 - 1624
  • [12] Rethinking the visual cues in audio-visual speaker extraction
    Li, Junjie
    Ge, Meng
    Pan, Zexu
    Cao, Rui
    Wang, Longbiao
    Dang, Jianwu
    Zhang, Shiliang
    INTERSPEECH 2023, 2023, : 3754 - 3758
  • [13] Egocentric Audio-Visual Object Localization
    Huang, Chao
    Flan, Yapeng
    Kurnar, Anurag
    Xu, Chenliang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921
  • [14] Audio-visual integration of emotional cues in song
    Thompson, William Forde
    Russo, Frank A.
    Quinto, Lena
    COGNITION & EMOTION, 2008, 22 (08) : 1457 - 1470
  • [15] Bootstrapping Audio-Visual Video Segmentation by Strengthening Audio Cues
    Chen, Tianxiang
    Tan, Zhentao
    Gong, Tao
    Chu, Qi
    Wu, Yue
    Liu, Bin
    Yu, Nenghai
    Lu, Le
    Ye, Jieping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2398 - 2409
  • [16] An automated framework for advertisement detection and removal from sports videos using audio-visual cues
    Toheed, Abeer
    Javed, Ali
    Irtaza, Aun
    Dawood, Hassan
    Dawood, Hussain
    Alfakeeh, Ahmed S.
    FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (02)
  • [17] Audio-visual Cues for Cloud Service Monitoring
    Bermbach, David
    Eberhardt, Jacob
    CLOSER: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2017, : 439 - 446
  • [18] An automated framework for advertisement detection and removal from sports videos using audio-visual cues
    Abeer Toheed
    Ali Javed
    Aun Irtaza
    Hassan Dawood
    Hussain Dawood
    Ahmed S. Alfakeeh
    Frontiers of Computer Science, 2021, 15
  • [19] An automated framework for advertisement detection and removal from sports videos using audio-visual cues
    Abeer TOHEED
    Ali JAVED
    Aun IRTAZA
    Hassan DAWOOD
    Hussain DAWOOD
    Ahmed SALFAKEEH
    Frontiers of Computer Science, 2021, (02) : 32 - 35
  • [20] Emotions Don't Lie: An Audio-Visual Deepfake Detection Method using Affective Cues
    Mittal, Trisha
    Bhattacharya, Uttaran
    Chandra, Rohan
    Bera, Aniket
    Manocha, Dinesh
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2823 - 2832