Fusion of Classifier Predictions for Audio-Visual Emotion Recognition

被引:0
|
作者
Noroozi, Fatemeh [1 ]
Marjanovic, Marina [2 ]
Njegus, Angelina [2 ]
Escalera, Sergio [3 ]
Anbarjafari, Gholamreza [4 ]
机构
[1] Univ Tartu, Inst Technol, EE-50411 Tartu, Estonia
[2] Singidunum Univ, Fac Tech Sci, Belgrade 11000, Serbia
[3] Univ Barcelona, Dept Math & Informat, Comp Vis Ctr, Barcelona, Spain
[4] Univ Tartu, Inst Technol, iCV Res Grp, EE-50411 Tartu, Estonia
基金
欧盟地平线“2020”;
关键词
SYSTEM; REAL; AGE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper is presented a novel multimodal emotion recognition system which is based on the analysis of audio and visual cues. MFCC-based features are extracted from the audio channel and facial landmark geometric relations are computed from visual data. Both sets of features are learnt separately using state-of-the-art classifiers. In addition, we summarise each emotion video into a reduced set of key-frames, which are learnt in order to visually discriminate emotions by means of a Convolutional Neural Network. Finally, confidence outputs of all classifiers from all modalities are used to define a new feature space to be learnt for final emotion prediction, in a late fusion/stacking fashion. The conducted experiments on eNTERFACE'05 database show significant performance improvements of our proposed system in comparison to state-of-the-art approaches.
引用
收藏
页码:61 / 66
页数:6
相关论文
共 50 条
  • [1] Semantic audio-visual data fusion for automatic emotion recognition
    Datcu, Dragos
    Rothkrantz, Leon J. M.
    EUROMEDIA '2008, 2008, : 58 - 65
  • [2] Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition
    Praveen, R. Gnana
    Granger, Eric
    Cardinal, Patrick
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [3] Audio-visual spontaneous emotion recognition
    Zeng, Zhihong
    Hu, Yuxiao
    Roisman, Glenn I.
    Wen, Zhen
    Fu, Yun
    Huang, Thomas S.
    ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
  • [4] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
    Wei, Jie
    Hu, Guanyu
    Yang, Xinyu
    Luu, Anh Tuan
    Dong, Yizhuo
    INTERSPEECH 2022, 2022, : 1988 - 1992
  • [5] Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition
    Guo, Peini
    Chen, Zhengyan
    Li, Yidi
    Liu, Hong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 315 - 326
  • [6] Audio-Visual Attention Networks for Emotion Recognition
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
  • [7] Audio-Visual Learning for Multimodal Emotion Recognition
    Fan, Siyu
    Jing, Jianan
    Wang, Chongwen
    SYMMETRY-BASEL, 2025, 17 (03):
  • [8] Deep operational audio-visual emotion recognition
    Akturk, Kaan
    Keceli, Ali Seydi
    NEUROCOMPUTING, 2024, 588
  • [9] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75
  • [10] Continuous Emotion Recognition with Audio-visual Leader-follower Attentive Fusion
    Zhang, Su
    Ding, Yi
    Wei, Ziquan
    Guan, Cuntai
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3560 - 3567