STUDY OF DENSE NETWORK APPROACHES FOR SPEECH EMOTION RECOGNITION

被引：0

作者：

Abdelwahab, Mohammed ^{[1
]}

Busso, Carlos ^{[1
]}

机构：

[1] Univ Texas Dallas, Dept Elect Comp Engn, Multimodal Signal Proc MSP Lab, Richardson, TX 75080 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Speech emotion recognition; Deep Neural Networks; NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks have been proven to be very effective in various classification problems and show great promise for emotion recognition from speech. Studies have proposed various architectures that further improve the performance of emotion recognition systems. However, there are still various open questions regarding the best approach to building a speech emotion recognition system. Would the system's performance improve if we have more labeled data? How much do we benefit from data augmentation? What activation and regularization schemes are more beneficial? How does the depth of the network affect the performance? We are collecting the MSP-Podcast corpus, a large dataset with over 30 hours of data, which provides an ideal resource to address these questions. This study explores various dense architectures to predict arousal, valence and dominance scores. We investigate varying the training set size, width, and depth of the network, as well as the activation functions used during training. We also study the effect of data augmentation on the network's performance. We find that bigger training set improves the performance. Batch normalization is crucial to achieving a good performance for deeper networks. We do not observe significant differences in the performance in residual networks compared to dense networks.

引用

页码：5084 / 5088

页数：5

共 50 条

[41] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
Hu, Ying
Hou, Shijing
Yang, Huamin
Huang, Hao
He, Liang
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
[42] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
Yi, Lu
Mak, Man-Wai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184
[43] Performance Evaluation of Deep Autoencoder Network for Speech Emotion Recognition
AndleebSiddiqui, Maria
Hussain, Wajahat
Ali, Syed Abbas
Danish-ur-Rehman
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 606 - 611
[44] Speech Emotion Recognition using MFCC features and LSTM network
Kumbhar, Harshawardhan S.
Bhandari, Sheetal U.
2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
[45] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
Huang, Ying
Hu, Mingqing
Yu, Xianguo
Wang, Tao
Yang, Chen
PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
[46] Bidirectional parallel echo state network for speech emotion recognition
Hemin Ibrahim
Chu Kiong Loo
Fady Alnajjar
Neural Computing and Applications, 2022, 34 : 17581 - 17599
[47] Speech Emotion Recognition based on Interactive Convolutional Neural Network
Cheng, Huihui
Tang, Xiaoyu
2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167
[48] Multi-modal Correlated Network for emotion recognition in speech
Ren, Minjie
Nie, Weizhi
Liu, Anan
Su, Yuting
VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
[49] Dense Attention Memory Network for Multi-modal emotion recognition
Ma, Gailing
Guo, Xiao
2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 48 - 53
[50] Speech emotion recognition based on spiking neural network and convolutional neural network
Du, Chengyan
Liu, Fu
Kang, Bing
Hou, Tao
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 147

← 1 2 3 4 5 →