Speech emotion recognition by using complex MFCC and deep sequential model

被引:23
|
作者
Patnaik, Suprava [1 ]
机构
[1] Kalinga Inst Ind Technol, Sch Elect, Bhubaneswar, Odisha, India
关键词
Speech emotion; MFCC; Emotion circumplex; 1-D CNN; NEURAL-NETWORKS; CLASSIFICATION; FEATURES;
D O I
10.1007/s11042-022-13725-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is one of the front-line research areas. For a machine, inferring SER is difficult because emotions are subjective and annotation is challenging. Nevertheless, researchers feel that SER is possible because speech is quasi-stationery and emotions are declarative finite states. This paper is about emotion classification by using Complex Mel Frequency Cepstral Coefficients (c-MFCC) as the representative trait and a deep sequential model as a classifier. The experimental setup is speaker independent and accommodates marginal variations in the underlying phonemes. Testing for this work has been carried out on RAVDESS and TESS databases. Conceptually, the proposed model is erogenous towards prosody observance. The main contributions of this work are of two-folds. Firstly, introducing conception of c-MFCC and investigating it as a robust cue of emotion and there by leading to significant improvement in accuracy performance. Secondly, establishing correlation between MFCC based accuracy and Russell's emotional circumplex pattern. As per the Russell's 2D emotion circumplex model, emotional signals are combinations of several psychological dimensions though perceived as discrete categories. Results of this work are outcome from a deep sequential LSTM model. Proposed c-MFCC are found to be more robust to handle signal framing, informative in terms of spectral roll off, and therefore put forward as an input to the classifier. For RAVDESS database the best accuracy achieved is 78.8% for fourteen classes, which subsequently improved to 91.6% for gender integrated eight classes and 98.5% for affective separated six classes. Though, the RAVDESS dataset has two analogous sentences revealed results are for the complete dataset and without applying any phonetic separation of the samples. Thus, proposed method appears to be semi-commutative on phonemes. Results obtained from this study are presented and discussed in forms of confusion matrices.
引用
收藏
页码:11897 / 11922
页数:26
相关论文
共 50 条
  • [21] Speech Recognition using MFCC and DTW
    Mohan, Bhadragiri Jagan
    Babu, Ramesh N.
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL ENGINEERING (ICAEE), 2014,
  • [22] ASERNet: Automatic speech emotion recognition system using MFCC-based LPC approach with deep learning CNN
    Jagadeeshwar, Kalyanapu
    Sreenivasarao, T.
    Pulicherla, Padmaja
    Satyanarayana, K. N. V.
    Lakshmi, K. Mohana
    Kumar, Pala Mahesh
    INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2023, 14 (04)
  • [23] Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition
    Flower, Thomas Mary Little
    Jaya, Thirasama
    Singh, Sreedharan Christopher Ezhil
    AUTOMATIKA, 2024, 65 (04) : 1325 - 1338
  • [24] Stressed Speech Emotion Recognition using feature fusion of Teager Energy Operator and MFCC
    Bandela, Surekha Reddy
    Kumar, T. Kishore
    2017 8TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2017,
  • [25] AUTOMATIC EMOTION RECOGNITION IN SPEECH SIGNAL USING TEAGER ENERGY OPERATOR AND MFCC FEATURES
    He, Ling
    Lech, Margaret
    Allen, Nicholas
    2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 3, 2012, : 695 - 699
  • [26] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Mishra, Swami
    Bhatnagar, Nehal
    Prakasam, P.
    Sureshkumar, T. R.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
  • [27] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Swami Mishra
    Nehal Bhatnagar
    Prakasam P
    Sureshkumar T. R
    Multimedia Tools and Applications, 2024, 83 : 37603 - 37620
  • [28] Speech Emotion Recognition using Deep Dropout Autoencoders
    Pal, Arghya
    Baskar, S.
    Baskar, S.
    2015 IEEE INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICETECH), 2015, : 124 - 129
  • [29] Throat Microphone Speech Recognition using MFCC
    Vijayan, Amritha
    Mathai, Bipil Mary
    Valsalan, Karthik
    Johnson, Riyanka Raji
    Mathew, Lani Rachel
    Gopakumar, K.
    2017 INTERNATIONAL CONFERENCE ON NETWORKS & ADVANCES IN COMPUTATIONAL TECHNOLOGIES (NETACT), 2017, : 392 - 395
  • [30] Deep learning based Affective Model for Speech Emotion Recognition
    Zhou, Xi
    Guo, Junqi
    Bie, Rongfang
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846