Speech Databases, Speech Features, and Classifiers in Speech Emotion Recognition: A Review

被引：0

作者：

Dar, G. H. Mohmad ^{[1
]}

Delhibabu, Radhakrishnan ^{[2
]}

机构：

[1] Vellore Inst Technol, Sch Adv Sci, Vellore 632014, Tamil Nadu, India

[2] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Speech emotion recognition; machine learning; deep learning; affective computing; support vector machine; random forest; Gaussian mixture model; audio features; databases; classifiers; COMMUNICATING EMOTION; INFORMATION FUSION; REPRESENTATIONS; IMPLEMENTATION; AUTOENCODER; GENERATION; DEPRESSION; EXPRESSION; NETWORKS; VALENCE;

D O I：

10.1109/ACCESS.2024.3476960

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emotion recognition from speech signals plays a crucial role in Human-Machine Interaction (HMI), particularly in the development of applications such as affective computing and interactive systems. This review seeks to provide an in-depth examination of current methodologies in speech emotion recognition (SER), with a focus on databases, feature extraction techniques, and classification models. It has been done in the past using low-level descriptors (LLDs) like Mel-Frequency Cepstral Coefficients (MFCCs), linear predictive coding (LPC), and pitch-based features in methods like Support Vector Machines (SVM), Random Forests (RF), and Gaussian Mixture Models (GMM). But the development of deep learning techniques has completely changed the field. Models like convolutional neural networks (CNNs) and long short-term memory (LSTM) networks have shown that they are better at capturing the complex temporal and spectral features of speech. This paper reviews prominent speech emotion datasets, exploring their linguistic diversity, annotation processes, and emotional labels. It also analyzes the efficacy of different speech features and classifiers in handling challenges such as data imbalance, limited data availability, and cross-lingual variations. The review highlights the need for future work to address real-time processing, context-sensitive emotion detection, and the integration of multi-modal data to enhance the performance of SER systems. By consolidating recent advancements and identifying areas for further research, this paper aims to provide a clearer path for optimizing feature extraction and classification techniques in the field of emotion recognition.

引用

页码：151122 / 151152

页数：31

共 50 条

[21] Speech Emotion Recognition with Cross-lingual Databases
Chiou, Bo-Chang
Chen, Chia-Ping
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
[22] Emotion recognition from speech: a review
Koolagudi, Shashidhar G.
Rao, K. Sreenivasa
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (02) : 99 - 117
[23] From Simulated Speech to Natural Speech, What are the Robust Features for Emotion Recognition?
Li, Ya
Chao, Linlin
Liu, Yazhu
Bao, Wei
Tao, Jianhua
2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 368 - 373
[24] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
Mohanty, Aniruddha
Cherukuri, Ravindranath C.
Prusty, Alok Ranjan
THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
[25] Emotion recognition of mandarin speech for different speech corpora based on nonlinear features
Gao, Hui
Chen, Shanguang
An, Ping
Su, Guangchuan
PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 567 - +
[26] Novel acoustic features for speech emotion recognition
ROH Yong-Wan
KIM Dong-Ju
LEE Woo-Seok
HONG Kwang-Seok
Science in China(Series E:Technological Sciences), 2009, 52 (07) : 1838 - 1848
[27] Exploiting the potentialities of features for speech emotion recognition
Li, Dongdong
Zhou, Yijun
Wang, Zhe
Gao, Daqi
INFORMATION SCIENCES, 2021, 548 : 328 - 343
[28] Significance of Phonological Features in Speech Emotion Recognition
Wei Wang
Paul A. Watters
Xinyi Cao
Lingjie Shen
Bo Li
International Journal of Speech Technology, 2020, 23 : 633 - 642
[29] Learning Transferable Features for Speech Emotion Recognition
Marczewski, Alison
Veloso, Adriano
Ziviani, Nivio
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 529 - 536
[30] Applying articulatory features to speech emotion recognition
Zhou, Yu
Sun, Yanqing
Yang, Lin
Yan, Yonghong
2009 INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN COMPUTER SCIENCE, ICRCCS 2009, 2009, : 73 - 76

← 1 2 3 4 5 →