A Three-Layer Emotion Perception Model for Valence and Arousal-Based Detection from Multilingual Speech

被引：9

作者：

Li, Xingfeng ^{[1
]}

Akagi, Masato ^{[1
]}

机构：

[1] Japan Adv Inst Sci & Technol, Nomi, Japan

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

emotion recognition; emotion dimension; three-layer model; prosodic feature; spectrogram; glottal waveform; RECOGNITION; EXPRESSION; FEATURES; QUALITY;

D O I：

10.21437/Interspeech.2018-1820

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automated emotion detection from speech has recently shifted from monolingual to multilingual tasks for human-like interaction in real-life where a system can handle more than a single input language. However, most work on monolingual emotion detection is difficult to generalize in multiple languages, because the optimal feature sets of the work differ from one language to another. Our study proposes a framework to design, implement, and validate an emotion detection system using multiple corpora. A continuous dimensional space of valence and arousal is first used to describe the emotions. A three-layer model incorporated with fuzzy inference systems is then used to estimate two dimensions. Speech features derived from prosodic, spectral, and glottal waveform are examined and selected to capture emotional cues. The results of this new system outperformed the existing state-of-the-art system by yielding a smaller mean absolute error and higher correlation between estimates and human evaluators. Moreover, results for speaker independent validation are comparable to human evaluators.

引用

页码：3643 / 3647

页数：5

共 50 条

[1] Multilingual Speech Emotion Recognition System based on a Three-layer Model
Li, Xingfeng
Akagi, Masato
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3608 - 3612
[2] Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model
Li, Xingfeng
Akagi, Masato
SPEECH COMMUNICATION, 2019, 110 : 1 - 12
[3] Cross-lingual Speech Emotion Recognition System Based on a Three-Layer Model for Human Perception
Elbarougy, Reda
Akagi, Masato
2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
[4] Improving speech emotion dimensions estimation using a three-layer model of human perception
Elbarougy, Reda
Akagi, Masato
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2014, 35 (02) : 86 - 98
[5] Hierarchical speech emotion recognition using the valence-arousal model
Arijul Haque
K. Sreenivasa Rao
Multimedia Tools and Applications, 2025, 84 (14) : 14029 - 14046
[6] Analyzing Emotional Oscillatory Brain Network for Valence and Arousal-Based Emotion Recognition Using EEG Data
Yan, Jianzhuo
Kuai, Hongzhi
Chen, Jianhui
Zhong, Ning
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2019, 18 (04) : 1359 - 1378
[7] Maximal Information Coefficient and Predominant Correlation-Based Feature Selection Toward A Three-Layer Model for Speech Emotion Recognition
Li, Xingfeng
Akagi, Masato
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1428 - 1434
[8] Emotion sensing from physiological signals using three defined areas in arousal-valence model
Wiem, Mimoun Ben Henia
Lachiri, Zied
2017 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND DIAGNOSIS (ICCAD), 2017, : 219 - 223
[9] Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition
Jalal, Md Asif
Milner, Rosanna
Hain, Thomas
INTERSPEECH 2020, 2020, : 4113 - 4117
[10] Towards a Framework for Multimodal Creativity States Detection from Emotion, Arousal, and Valence
Kalateh, Sepideh
Hojjati, Sanaz Nikghadam
Barata, Jose
COMPUTATIONAL SCIENCE, ICCS 2024, PT III, 2024, 14834 : 79 - 86

← 1 2 3 4 5 →