Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients

被引:1
|
作者
Manju D. Pawar
Rajendra D. Kokate
机构
[1] Maharashtra Institute of Technology,
[2] Government College of Engineering,undefined
来源
关键词
Convolution neural network; Feature extraction; Speech emotion recognition; Energy; Pitch;
D O I
暂无
中图分类号
学科分类号
摘要
A significant role is played by Speech Emotion Recognition (SER) with different applications in affective computing and human-computer interface. In literature, the most adapted technique for recognition of emotion was based on simple feature extraction using a simple classifier. Most of the methods in the literature has limited efficiency for the recognition of emotion. Hence for solving these drawbacks, five various models based on Convolution Neural Network (CNN) was proposed in this paper for recognition of emotion through signals obtained on speech. In the methodology which was proposed, seven different emotions are recognised with the utilisation of CNN with feature extraction methods includes disgust, normal, fear Joy, Anger, Sadness and surprise. Initially, the speech emotion signals are collected from the database such as berlin database. After that, feature extraction is considered, and it is carried out by the Pitch and Energy, Mel-Frequency Cepstral Coefficients (MFCC) and Mel Energy Spectrum Dynamic Coefficients (MEDC). The mentioned feature extraction process is widely used for classifying the speech data and perform better in performance. Mel-cepstral coefficients utilise less time for shaping the spectral with adequate data and offers better voice quality. The extracted features are used for the recognition purpose by CNN network. In the proposed CNN network, either one or more pairs of convolutions, besides, max-pooling layers remain present. With the utilisation of the CNN network, the emotions are recognised through the input speech signal. The proposed method is implemented in MATLAB, and it will be contrasted with the existing method such as Linear Prediction Cepstral Coefficient (LPCC) with the K-Nearest Neighbour (KNN) classifier to test the samples for optimal performance evaluation. The Statistical measurements are utilised for analysing the performance such as accuracy, precision, specificity, recall, sensitivity, error rate, receiver operating characteristics (ROC) curve, an area under curve (AUC), and False Positive Rate (FPR).
引用
收藏
页码:15563 / 15587
页数:24
相关论文
共 50 条
  • [31] Mel-Frequency Cepstral Coefficient Analysis in Speech Recognition
    On, Chin Kim
    Pandiyan, Paulraj M.
    Yaacob, Sazali
    Saudi, Azali
    2006 INTERNATIONAL CONFERENCE ON COMPUTING & INFORMATICS (ICOCI 2006), 2006, : 291 - +
  • [32] Automatic Voice Recognition System based on Multiple Support Vector Machines and Mel-Frequency Cepstral Coefficients
    Barbosa, Felipe Gomes
    Santos Silva, Washington Luis
    2015 11TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2015, : 665 - 670
  • [33] Speech Based Arithmetic Calculator Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models
    Husain, Moula
    Meena, S. M.
    Gonal, Manjunath K.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 209 - 218
  • [34] IMPROVEMENTS ON MEL-FREQUENCY CEPSTRUM MINIMUM-MEAN-SQUARE-ERROR NOISE SUPPRESSOR FOR ROBUST SPEECH RECOGNITION
    Yu, Dong
    Deng, Li
    Wu, Jian
    Gong, Yifan
    Acero, Alex
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 69 - 72
  • [35] On the Inversion of Mel-Frequency Cepstral Coefficients for Speech Enhancement Applications
    Boucheron, Laura E.
    De Leon, Phillip L.
    ICSES 2008 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS, CONFERENCE PROCEEDINGS, 2008, : 485 - 488
  • [36] Non-Destructive Classification of Watermelon Ripeness using Mel-Frequency Cepstrum Coefficients and Multilayer Perceptrons
    Baki, Shah Rizam M. Shah
    Annuar, Mohd Z. M.
    Yassin, Ihsan M.
    Hassan, Hasliza A.
    Zabidi, Azlee
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [37] How many Mel-frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language
    Hasan, Md. Rakibul
    Hasan, Md. Mahbub
    Hossain, Md Zakir
    JOURNAL OF ENGINEERING-JOE, 2021, 2021 (12): : 817 - 827
  • [38] PPG-based human identification using Mel-frequency cepstral coefficients and neural networks
    Siam, Ali I.
    Elazm, Atef Abou
    El-Bahnasawy, Nirmeen A.
    El Banby, Ghada M.
    Abd El-Samie, Fathi E.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (17) : 26001 - 26019
  • [39] PPG-based human identification using Mel-frequency cepstral coefficients and neural networks
    Ali I. Siam
    Atef Abou Elazm
    Nirmeen A. El-Bahnasawy
    Ghada M. El Banby
    Fathi E. Abd El-Samie
    Multimedia Tools and Applications, 2021, 80 : 26001 - 26019
  • [40] Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures
    Darch, Jonathan
    Milner, Ben
    Vaseghi, Saeed
    Journal of the Acoustical Society of America, 2009, 124 (06): : 3989 - 4000