Multilingual Speech Emotion Recognition System based on a Three-layer Model

被引：9

作者：

Li, Xingfeng ^{[1
]}

Akagi, Masato ^{[1
]}

机构：

[1] Japan Adv Inst Sci & Technol, 1-1 Asahidai, Nomi, Ishikawa 9231292, Japan

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

基金：

日本学术振兴会;

关键词：

emotion recognition in speech; three-layer model; emotion dimension;

D O I：

10.21437/Interspeech.2016-645

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech Emotion Recognition (SER) systems currently are focusing on classifying emotions on each single language. Since optimal acoustic sets are strongly language dependent, to achieve a generalized SER system working for multiple languages, issues of selection of common features and retraining are still challenging. In this paper, we therefore present a SER system in a multilingual scenario from perspective of human perceptual processing. The goal is twofold. Firstly, to predict multilingual emotion dimensions accurately such as human annotations. To this end, a three layered model consist of acoustic features, semantic primitives, emotion dimensions, along with Fuzzy Inference System (FIS) were studied. Secondly, by knowledge of human perception of emotion among languages in dimensional space, we adopt direction and distance as common features to detect multilingual emotions. Results of estimation performance of emotion dimensions comparable to human evaluation is furnished, and classification rates that are close to monolingual SER system performed are achieved.

引用

页码：3608 / 3612

页数：5

共 50 条

[41] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
Atmaja, Bagus Tris
Akagi, Masato
2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
[42] Emotion recognition and synthesis system on speech
Moriyama, Tsuyoshi
Ozawa, Shinji
International Conference on Multimedia Computing and Systems -Proceedings, 1999, 1 : 840 - 844
[43] Emotion recognition and synthesis system on speech
Moriyama, T
Ozawa, S
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 840 - 844
[44] A three-layer model of an amorphous microwire
S. A. Baranov
Surface Engineering and Applied Electrochemistry, 2010, 46 : 271 - 275
[45] English speech emotion recognition method based on speech recognition
Liu, Man
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
[46] English speech emotion recognition method based on speech recognition
Man Liu
International Journal of Speech Technology, 2022, 25 : 391 - 398
[47] Hysteresis in the underdamped three-layer model
Li-Ping Jia
Jasmina Tekić
Scientific Reports, 10
[48] A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition
Zhong, Ying
Hu, Ying
Huang, Hao
Silamu, Wushour
INTERSPEECH 2020, 2020, : 3331 - 3335
[49] Speech Emotion Recognition Based on Wavelet Packet Coefficient Model
Wang, Kunxia
An, Ning
Li, Lian
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 478 - 482
[50] Deep learning based Affective Model for Speech Emotion Recognition
Zhou, Xi
Guo, Junqi
Bie, Rongfang
2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846

← 1 2 3 4 5 →