Speech emotion recognition by using complex MFCC and deep sequential model

被引:23
|
作者
Patnaik, Suprava [1 ]
机构
[1] Kalinga Inst Ind Technol, Sch Elect, Bhubaneswar, Odisha, India
关键词
Speech emotion; MFCC; Emotion circumplex; 1-D CNN; NEURAL-NETWORKS; CLASSIFICATION; FEATURES;
D O I
10.1007/s11042-022-13725-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is one of the front-line research areas. For a machine, inferring SER is difficult because emotions are subjective and annotation is challenging. Nevertheless, researchers feel that SER is possible because speech is quasi-stationery and emotions are declarative finite states. This paper is about emotion classification by using Complex Mel Frequency Cepstral Coefficients (c-MFCC) as the representative trait and a deep sequential model as a classifier. The experimental setup is speaker independent and accommodates marginal variations in the underlying phonemes. Testing for this work has been carried out on RAVDESS and TESS databases. Conceptually, the proposed model is erogenous towards prosody observance. The main contributions of this work are of two-folds. Firstly, introducing conception of c-MFCC and investigating it as a robust cue of emotion and there by leading to significant improvement in accuracy performance. Secondly, establishing correlation between MFCC based accuracy and Russell's emotional circumplex pattern. As per the Russell's 2D emotion circumplex model, emotional signals are combinations of several psychological dimensions though perceived as discrete categories. Results of this work are outcome from a deep sequential LSTM model. Proposed c-MFCC are found to be more robust to handle signal framing, informative in terms of spectral roll off, and therefore put forward as an input to the classifier. For RAVDESS database the best accuracy achieved is 78.8% for fourteen classes, which subsequently improved to 91.6% for gender integrated eight classes and 98.5% for affective separated six classes. Though, the RAVDESS dataset has two analogous sentences revealed results are for the complete dataset and without applying any phonetic separation of the samples. Thus, proposed method appears to be semi-commutative on phonemes. Results obtained from this study are presented and discussed in forms of confusion matrices.
引用
收藏
页码:11897 / 11922
页数:26
相关论文
共 50 条
  • [31] Developing a negative speech emotion recognition model for safety systems using deep learning
    Jena, Shreya
    Basak, Sneha
    Agrawal, Himanshi
    Saini, Bunny
    Gite, Shilpa
    Kotecha, Ketan
    Alfarhood, Sultan
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [32] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
  • [33] Speech emotion recognition using scalogram based deep structure
    Aghajani K.
    Esmaili Paeen Afrakoti I.
    International Journal of Engineering, Transactions B: Applications, 2020, 33 (02): : 285 - 292
  • [34] Speech Emotion Recognition Using Deep Learning Techniques: A Review
    Khalil, Ruhul Amin
    Jones, Edward
    Babar, Mohammad Inayatullah
    Jan, Tariqullah
    Zafar, Mohammad Haseeb
    Alhussain, Thamer
    IEEE ACCESS, 2019, 7 : 117327 - 117345
  • [35] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
  • [36] Emotion recognition from speech using deep learning on spectrograms
    Li, Xingguang
    Song, Wenjun
    Liang, Zonglin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2791 - 2796
  • [37] Speech Emotion Recognition Using Deep Learning on audio recordings
    Suganya, S.
    Charles, E. Y. A.
    2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [38] Speech Emotion Recognition Using Scalogram Based Deep Structure
    Aghajani, K.
    Afrakoti, I. Esmaili Paeen
    INTERNATIONAL JOURNAL OF ENGINEERING, 2020, 33 (02): : 285 - 292
  • [39] F0, LPC, and MFCC Analysis for Emotion Recognition Based on Speech
    Teixeira, Felipe L.
    Teixeira, Joao Paulo
    Soares, Salviano F. P.
    Pio Abreu, J. L.
    OPTIMIZATION, LEARNING ALGORITHMS AND APPLICATIONS, OL2A 2022, 2022, 1754 : 389 - 404
  • [40] English Language Speech Recognition using MFCC and HMM
    Naithani, Kanchan
    Thakkar, V. M.
    Semwal, Ashish
    2018 IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN INTELLIGENT AND COMPUTING IN ENGINEERING (RICE III), 2018,