Speech emotion recognition by using complex MFCC and deep sequential model

被引：23

作者：

Patnaik, Suprava ^{[1
]}

机构：

[1] Kalinga Inst Ind Technol, Sch Elect, Bhubaneswar, Odisha, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 82卷 / 08期

关键词：

Speech emotion; MFCC; Emotion circumplex; 1-D CNN; NEURAL-NETWORKS; CLASSIFICATION; FEATURES;

D O I：

10.1007/s11042-022-13725-y

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech Emotion Recognition (SER) is one of the front-line research areas. For a machine, inferring SER is difficult because emotions are subjective and annotation is challenging. Nevertheless, researchers feel that SER is possible because speech is quasi-stationery and emotions are declarative finite states. This paper is about emotion classification by using Complex Mel Frequency Cepstral Coefficients (c-MFCC) as the representative trait and a deep sequential model as a classifier. The experimental setup is speaker independent and accommodates marginal variations in the underlying phonemes. Testing for this work has been carried out on RAVDESS and TESS databases. Conceptually, the proposed model is erogenous towards prosody observance. The main contributions of this work are of two-folds. Firstly, introducing conception of c-MFCC and investigating it as a robust cue of emotion and there by leading to significant improvement in accuracy performance. Secondly, establishing correlation between MFCC based accuracy and Russell's emotional circumplex pattern. As per the Russell's 2D emotion circumplex model, emotional signals are combinations of several psychological dimensions though perceived as discrete categories. Results of this work are outcome from a deep sequential LSTM model. Proposed c-MFCC are found to be more robust to handle signal framing, informative in terms of spectral roll off, and therefore put forward as an input to the classifier. For RAVDESS database the best accuracy achieved is 78.8% for fourteen classes, which subsequently improved to 91.6% for gender integrated eight classes and 98.5% for affective separated six classes. Though, the RAVDESS dataset has two analogous sentences revealed results are for the complete dataset and without applying any phonetic separation of the samples. Thus, proposed method appears to be semi-commutative on phonemes. Results obtained from this study are presented and discussed in forms of confusion matrices.

引用

页码：11897 / 11922

页数：26

共 50 条

[21] Speech Recognition using MFCC and DTW
Mohan, Bhadragiri Jagan
Babu, Ramesh N.
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL ENGINEERING (ICAEE), 2014,
[22] ASERNet: Automatic speech emotion recognition system using MFCC-based LPC approach with deep learning CNN
Jagadeeshwar, Kalyanapu
Sreenivasarao, T.
Pulicherla, Padmaja
Satyanarayana, K. N. V.
Lakshmi, K. Mohana
Kumar, Pala Mahesh
INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2023, 14 (04)
[23] Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition
Flower, Thomas Mary Little
Jaya, Thirasama
Singh, Sreedharan Christopher Ezhil
AUTOMATIKA, 2024, 65 (04) : 1325 - 1338
[24] Stressed Speech Emotion Recognition using feature fusion of Teager Energy Operator and MFCC
Bandela, Surekha Reddy
Kumar, T. Kishore
2017 8TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2017,
[25] AUTOMATIC EMOTION RECOGNITION IN SPEECH SIGNAL USING TEAGER ENERGY OPERATOR AND MFCC FEATURES
He, Ling
Lech, Margaret
Allen, Nicholas
2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 3, 2012, : 695 - 699
[26] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
Mishra, Swami
Bhatnagar, Nehal
Prakasam, P.
Sureshkumar, T. R.
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
[27] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
Swami Mishra
Nehal Bhatnagar
Prakasam P
Sureshkumar T. R
Multimedia Tools and Applications, 2024, 83 : 37603 - 37620
[28] Speech Emotion Recognition using Deep Dropout Autoencoders
Pal, Arghya
Baskar, S.
Baskar, S.
2015 IEEE INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICETECH), 2015, : 124 - 129
[29] Throat Microphone Speech Recognition using MFCC
Vijayan, Amritha
Mathai, Bipil Mary
Valsalan, Karthik
Johnson, Riyanka Raji
Mathew, Lani Rachel
Gopakumar, K.
2017 INTERNATIONAL CONFERENCE ON NETWORKS & ADVANCES IN COMPUTATIONAL TECHNOLOGIES (NETACT), 2017, : 392 - 395
[30] Deep learning based Affective Model for Speech Emotion Recognition
Zhou, Xi
Guo, Junqi
Bie, Rongfang
2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846

← 1 2 3 4 5 →