Fusion of Classifier Predictions for Audio-Visual Emotion Recognition

被引:0
|
作者
Noroozi, Fatemeh [1 ]
Marjanovic, Marina [2 ]
Njegus, Angelina [2 ]
Escalera, Sergio [3 ]
Anbarjafari, Gholamreza [4 ]
机构
[1] Univ Tartu, Inst Technol, EE-50411 Tartu, Estonia
[2] Singidunum Univ, Fac Tech Sci, Belgrade 11000, Serbia
[3] Univ Barcelona, Dept Math & Informat, Comp Vis Ctr, Barcelona, Spain
[4] Univ Tartu, Inst Technol, iCV Res Grp, EE-50411 Tartu, Estonia
基金
欧盟地平线“2020”;
关键词
SYSTEM; REAL; AGE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper is presented a novel multimodal emotion recognition system which is based on the analysis of audio and visual cues. MFCC-based features are extracted from the audio channel and facial landmark geometric relations are computed from visual data. Both sets of features are learnt separately using state-of-the-art classifiers. In addition, we summarise each emotion video into a reduced set of key-frames, which are learnt in order to visually discriminate emotions by means of a Convolutional Neural Network. Finally, confidence outputs of all classifiers from all modalities are used to define a new feature space to be learnt for final emotion prediction, in a late fusion/stacking fashion. The conducted experiments on eNTERFACE'05 database show significant performance improvements of our proposed system in comparison to state-of-the-art approaches.
引用
收藏
页码:61 / 66
页数:6
相关论文
共 50 条
  • [41] DISENTANGLEMENT FOR AUDIO-VISUAL EMOTION RECOGNITION USING MULTITASK SETUP
    Peri, Raghuveer
    Parthasarathy, Srinivas
    Bradshaw, Charles
    Sundaram, Shiva
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6344 - 6348
  • [42] Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
    Lubis, Nurul
    Gomez, Randy
    Sakti, Sakriani
    Nakamura, Keisuke
    Yoshino, Koichiro
    Nakamura, Satoshi
    Nakadai, Kazuhiro
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2180 - 2184
  • [43] Kernel Fusion of Audio and Visual Information for Emotion Recognition
    Wang, Yongjin
    Zhang, Rui
    Guan, Ling
    Venetsanopoulos, A. N.
    IMAGE ANALYSIS AND RECOGNITION: 8TH INTERNATIONAL CONFERENCE, ICIAR 2011, PT II: 8TH INTERNATIONAL CONFERENCE, ICIAR 2011, 2011, 6754 : 140 - 150
  • [44] Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition
    Praveen, R. Gnana
    Alam, Jahangir
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (03) : 444 - 458
  • [45] Empirical Study of Audio-Visual Features Fusion for Gait Recognition
    Castro, Francisco M.
    Marin-Jimenez, Manuel J.
    Guil, Nicolas
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2015, PT I, 2015, 9256 : 727 - 739
  • [46] Continuous Phoneme Recognition based on Audio-Visual Modality Fusion
    Richter, Julius
    Liebold, Jeanine
    Gerkamnn, Timo
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [47] AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [48] Continuous audio-visual digit recognition using decision fusion
    Meyer, G
    Mulligan, J
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 305 - 308
  • [49] Multimodal Attentive Fusion Network for audio-visual event recognition
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    INFORMATION FUSION, 2022, 85 : 52 - 59
  • [50] A cortical circuit for audio-visual predictions
    Aleena R. Garner
    Georg B. Keller
    Nature Neuroscience, 2022, 25 : 98 - 105