Speech Emotion Recognition via Sparse Learning-Based Fusion Model

被引：0

作者：

Min, Dong-Jin ^{[1
]}

Kim, Deok-Hwan ^{[1
]}

机构：

[1] Inha Univ, Dept Elect & Comp Engn, Incheon 22212, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

Emotion recognition; Speech recognition; Hidden Markov models; Feature extraction; Brain modeling; Accuracy; Convolutional neural networks; Data models; Time-domain analysis; Deep learning; 2D convolutional neural network squeeze and excitation network; multivariate long short-term memory-fully convolutional network; late fusion; sparse learning; FEATURES; DATABASES; ATTENTION; NETWORK;

D O I：

10.1109/ACCESS.2024.3506565

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech communication is a powerful tool for conveying intentions and emotions, fostering mutual understanding, and strengthening relationships. In the realm of natural human-computer interaction, speech-emotion recognition plays a crucial role. This process involves three stages: dataset collection, feature extraction, and emotion classification. Collecting speech-emotion recognition datasets is a complex and costly process, leading to limited data volumes and uneven emotional distributions. This scarcity and imbalance pose significant challenges, affecting the accuracy and reliability of emotion recognition. To address these issues, this study introduces a novel model that is more robust and adaptive. We employ the Ranking Magnitude Method (RMM) based on sparse learning. We use the Root Mean Square (RMS) energy and Zero Crossing Rate (ZCR) as temporal features to measure the speech's overall volume and noise intensity. The Mel Frequency Cepstral Coefficient (MFCC) is utilized to extract critical speech features, which are then integrated into a multivariate Long Short-Term Memory-Fully Convolutional Network (LSTM-FCN) model. We analyze the utterance levels using the log-Mel spectrogram for spatial features, processing these patterns through a 2D Convolutional Neural Network Squeeze and Excitation Network (CNN-SEN) model. The core of our method is a Sparse Learning-Based Fusion Model (SLBF), which addresses dataset imbalances by selectively retraining the underperforming nodes. This dynamic adjustment of learning priorities significantly enhances the robustness and accuracy of emotion recognition. Using this approach, our model outperforms state-of-the-art methods for various datasets, achieving impressive accuracy rates of 97.18%, 97.92%, 99.31%, and 96.89% for the EMOVO, RAVDESS, SAVE, and EMO-DB datasets, respectively.

引用

页码：177219 / 177235

页数：17

共 50 条

[21] Cross-Corpus Speech Emotion Recognition Based on Sparse Subspace Transfer Learning
Zhao, Keke
Song, Peng
Zhang, Wenjing
Zhang, Weijian
Li, Shaokai
Chen, Dongliang
Zheng, Wenming
BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 466 - 473
[22] A Subset of Acoustic Features for Machine Learning-based and Statistical Approaches in Speech Emotion Recognition
Costantini, Giovanni
Cesarini, Valerio
Casali, Daniele
BIOSIGNALS: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 4: BIOSIGNALS, 2022, : 257 - 264
[23] Bi-Feature Selection Deep Learning-Based Techniques for Speech Emotion Recognition
Akinpelu, Samson
Viriri, Serestina
ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT I, 2025, 15046 : 345 - 356
[24] Speech Emotion Recognition based on Multiple Feature Fusion
Jiang, Changjiang
Mao, Rong
Liu, Geng
Wang, Mingyi
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 907 - 912
[25] ANN based Decision Fusion for Speech Emotion Recognition
Xu, Lu
Xu, Mingxing
Yang, Dali
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2003 - +
[26] Speech emotion recognition based on a modified brain emotional learning model
Motamed, Sara
Setayeshi, Saeed
Rabiee, Azam
BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES, 2017, 19 : 32 - 38
[27] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
Song, Peng
Zheng, Wenming
Yu, Yanwei
Ou, Shifeng
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
[28] Speech Emotion Recognition Based on Learning Automata in
Motamed, Sara
Setayeshi, Saeed
Farhoudi, Zeinab
Ahmadi, Ali
JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE-JMCS, 2014, 12 (03): : 173 - 185
[29] Design of smart home system speech emotion recognition model based on ensemble deep learning and feature fusion
Wang, Mengsheng
Ma, Hongbin
Wang, Yingli
Sun, Xianhe
APPLIED ACOUSTICS, 2024, 218
[30] Novel feature fusion method for speech emotion recognition based on multiple kernel learning
Zhao, L. (zhaoli@seu.edu.cn), 1600, Southeast University (29):

← 1 2 3 4 5 →