Multilingual Speech Emotion Recognition System based on a Three-layer Model

被引:9
|
作者
Li, Xingfeng [1 ]
Akagi, Masato [1 ]
机构
[1] Japan Adv Inst Sci & Technol, 1-1 Asahidai, Nomi, Ishikawa 9231292, Japan
基金
日本学术振兴会;
关键词
emotion recognition in speech; three-layer model; emotion dimension;
D O I
10.21437/Interspeech.2016-645
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) systems currently are focusing on classifying emotions on each single language. Since optimal acoustic sets are strongly language dependent, to achieve a generalized SER system working for multiple languages, issues of selection of common features and retraining are still challenging. In this paper, we therefore present a SER system in a multilingual scenario from perspective of human perceptual processing. The goal is twofold. Firstly, to predict multilingual emotion dimensions accurately such as human annotations. To this end, a three layered model consist of acoustic features, semantic primitives, emotion dimensions, along with Fuzzy Inference System (FIS) were studied. Secondly, by knowledge of human perception of emotion among languages in dimensional space, we adopt direction and distance as common features to detect multilingual emotions. Results of estimation performance of emotion dimensions comparable to human evaluation is furnished, and classification rates that are close to monolingual SER system performed are achieved.
引用
收藏
页码:3608 / 3612
页数:5
相关论文
共 50 条
  • [31] THE GENERALIZATION EFFECT FOR MULTILINGUAL SPEECH EMOTION RECOGNITION ACROSS HETEROGENEOUS LANGUAGES
    Lee, Shi-wook
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5881 - 5885
  • [32] Context-Independent Multilingual Emotion Recognition from Speech Signals
    Vladimir Hozjan
    Zdravko Kačič
    International Journal of Speech Technology, 2003, 6 (3) : 311 - 320
  • [33] The Generalization Effect for Multilingual Speech Emotion Recognition across Heterogeneous Languages
    Lee, Shi-Wook
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2019, 2019-May : 5881 - 5885
  • [34] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
    Al-onazi, Badriyya B.
    Nauman, Muhammad Asif
    Jahangir, Rashid
    Malik, Muhmmad Mohsin
    Alkhammash, Eman H.
    Elshewey, Ahmed M.
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [35] MSFL: Explainable Multitask-Based Shared Feature Learning for Multilingual Speech Emotion Recognition
    Ma, Yiping
    Wang, Wei
    APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [36] MULTI-OBJECTIVE HEURISTIC FEATURE SELECTION FOR SPEECH-BASED MULTILINGUAL EMOTION RECOGNITION
    Brester, Christina
    Semenkin, Eugene
    Sidorov, Maxim
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2016, 6 (04) : 243 - 253
  • [37] Multilingual Speech Emotion Research Based on Data Mining
    Yu, Yanting
    ADVANCES IN MULTIMEDIA, 2022, 2022
  • [38] Hysteresis in the underdamped three-layer model
    Jia, Li-Ping
    Tekic, Jasmina
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [39] Three-layer model for exchange anisotropy
    Rezende, SM
    Azevedo, A
    de Aguiar, FM
    Fermin, JR
    Egelhoff, WF
    Parkin, SSP
    PHYSICAL REVIEW B, 2002, 66 (06): : 641091 - 641096
  • [40] A three-layer system for image retrieval
    Zhong, Daidi
    Defee, Irek
    SIGMAP 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2007, : 208 - +