Fusion of Classifier Predictions for Audio-Visual Emotion Recognition

被引:0
|
作者
Noroozi, Fatemeh [1 ]
Marjanovic, Marina [2 ]
Njegus, Angelina [2 ]
Escalera, Sergio [3 ]
Anbarjafari, Gholamreza [4 ]
机构
[1] Univ Tartu, Inst Technol, EE-50411 Tartu, Estonia
[2] Singidunum Univ, Fac Tech Sci, Belgrade 11000, Serbia
[3] Univ Barcelona, Dept Math & Informat, Comp Vis Ctr, Barcelona, Spain
[4] Univ Tartu, Inst Technol, iCV Res Grp, EE-50411 Tartu, Estonia
基金
欧盟地平线“2020”;
关键词
SYSTEM; REAL; AGE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper is presented a novel multimodal emotion recognition system which is based on the analysis of audio and visual cues. MFCC-based features are extracted from the audio channel and facial landmark geometric relations are computed from visual data. Both sets of features are learnt separately using state-of-the-art classifiers. In addition, we summarise each emotion video into a reduced set of key-frames, which are learnt in order to visually discriminate emotions by means of a Convolutional Neural Network. Finally, confidence outputs of all classifiers from all modalities are used to define a new feature space to be learnt for final emotion prediction, in a late fusion/stacking fashion. The conducted experiments on eNTERFACE'05 database show significant performance improvements of our proposed system in comparison to state-of-the-art approaches.
引用
收藏
页码:61 / 66
页数:6
相关论文
共 50 条
  • [31] Information Fusion Techniques in Audio-Visual Speech Recognition
    Karabalkan, H.
    Erdogan, H.
    2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 734 - 737
  • [32] Emotion Recognition From Audio-Visual Data Using Rule Based Decision Level Fusion
    Sahoo, Subhasmita
    Routray, Aurobinda
    PROCEEDINGS OF THE 2016 IEEE STUDENTS' TECHNOLOGY SYMPOSIUM (TECHSYM), 2016, : 7 - 12
  • [33] A new information fusion method for SVM-Based robotic audio-visual emotion recognition
    Han, Meng-Ju
    Hsu, Jing-Huai
    Song, Kai-Tai
    Chang, Fuh-Yu
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 2464 - +
  • [34] Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition
    Farhoudi, Zeinab
    Setayeshi, Saeed
    SPEECH COMMUNICATION, 2021, 127 : 92 - 103
  • [35] An Active Learning Paradigm for Online Audio-Visual Emotion Recognition
    Kansizoglou, Ioannis
    Bampis, Loukas
    Gasteratos, Antonios
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) : 756 - 768
  • [36] Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition
    Ghaleb, Esam
    Popa, Mirela
    Asteriadis, Stylianos
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [37] MANDARIN AUDIO-VISUAL SPEECH RECOGNITION WITH EFFECTS TO THE NOISE AND EMOTION
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Chen, Yu-Te
    Wu, Tsan-Nung
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (02): : 711 - 723
  • [38] Multimodal Emotion Recognition using Physiological and Audio-Visual Features
    Matsuda, Yuki
    Fedotov, Dmitrii
    Takahashi, Yuta
    Arakawa, Yutaka
    Yasumo, Keiichi
    Minker, Wolfgang
    PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 946 - 951
  • [39] A PRE-TRAINED AUDIO-VISUAL TRANSFORMER FOR EMOTION RECOGNITION
    Minh Tran
    Soleymani, Mohammad
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4698 - 4702
  • [40] ISLA: Temporal Segmentation and Labeling for Audio-Visual Emotion Recognition
    Kim, Yelin
    Provost, Emily Mower
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (02) : 196 - 208