Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients

被引：1

作者：

Manju D. Pawar

Rajendra D. Kokate

机构：

[1] Maharashtra Institute of Technology,

[2] Government College of Engineering,undefined

来源：

Multimedia Tools and Applications | 2021年 / 80卷

关键词：

Convolution neural network; Feature extraction; Speech emotion recognition; Energy; Pitch;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

A significant role is played by Speech Emotion Recognition (SER) with different applications in affective computing and human-computer interface. In literature, the most adapted technique for recognition of emotion was based on simple feature extraction using a simple classifier. Most of the methods in the literature has limited efficiency for the recognition of emotion. Hence for solving these drawbacks, five various models based on Convolution Neural Network (CNN) was proposed in this paper for recognition of emotion through signals obtained on speech. In the methodology which was proposed, seven different emotions are recognised with the utilisation of CNN with feature extraction methods includes disgust, normal, fear Joy, Anger, Sadness and surprise. Initially, the speech emotion signals are collected from the database such as berlin database. After that, feature extraction is considered, and it is carried out by the Pitch and Energy, Mel-Frequency Cepstral Coefficients (MFCC) and Mel Energy Spectrum Dynamic Coefficients (MEDC). The mentioned feature extraction process is widely used for classifying the speech data and perform better in performance. Mel-cepstral coefficients utilise less time for shaping the spectral with adequate data and offers better voice quality. The extracted features are used for the recognition purpose by CNN network. In the proposed CNN network, either one or more pairs of convolutions, besides, max-pooling layers remain present. With the utilisation of the CNN network, the emotions are recognised through the input speech signal. The proposed method is implemented in MATLAB, and it will be contrasted with the existing method such as Linear Prediction Cepstral Coefficient (LPCC) with the K-Nearest Neighbour (KNN) classifier to test the samples for optimal performance evaluation. The Statistical measurements are utilised for analysing the performance such as accuracy, precision, specificity, recall, sensitivity, error rate, receiver operating characteristics (ROC) curve, an area under curve (AUC), and False Positive Rate (FPR).

引用

页码：15563 / 15587

页数：24

共 50 条

[31] Mel-Frequency Cepstral Coefficient Analysis in Speech Recognition
On, Chin Kim
Pandiyan, Paulraj M.
Yaacob, Sazali
Saudi, Azali
2006 INTERNATIONAL CONFERENCE ON COMPUTING & INFORMATICS (ICOCI 2006), 2006, : 291 - +
[32] Automatic Voice Recognition System based on Multiple Support Vector Machines and Mel-Frequency Cepstral Coefficients
Barbosa, Felipe Gomes
Santos Silva, Washington Luis
2015 11TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2015, : 665 - 670
[33] Speech Based Arithmetic Calculator Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models
Husain, Moula
Meena, S. M.
Gonal, Manjunath K.
PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 209 - 218
[34] IMPROVEMENTS ON MEL-FREQUENCY CEPSTRUM MINIMUM-MEAN-SQUARE-ERROR NOISE SUPPRESSOR FOR ROBUST SPEECH RECOGNITION
Yu, Dong
Deng, Li
Wu, Jian
Gong, Yifan
Acero, Alex
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 69 - 72
[35] On the Inversion of Mel-Frequency Cepstral Coefficients for Speech Enhancement Applications
Boucheron, Laura E.
De Leon, Phillip L.
ICSES 2008 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS, CONFERENCE PROCEEDINGS, 2008, : 485 - 488
[36] Non-Destructive Classification of Watermelon Ripeness using Mel-Frequency Cepstrum Coefficients and Multilayer Perceptrons
Baki, Shah Rizam M. Shah
Annuar, Mohd Z. M.
Yassin, Ihsan M.
Hassan, Hasliza A.
Zabidi, Azlee
2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
[37] How many Mel-frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language
Hasan, Md. Rakibul
Hasan, Md. Mahbub
Hossain, Md Zakir
JOURNAL OF ENGINEERING-JOE, 2021, 2021 (12): : 817 - 827
[38] PPG-based human identification using Mel-frequency cepstral coefficients and neural networks
Siam, Ali I.
Elazm, Atef Abou
El-Bahnasawy, Nirmeen A.
El Banby, Ghada M.
Abd El-Samie, Fathi E.
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (17) : 26001 - 26019
[39] PPG-based human identification using Mel-frequency cepstral coefficients and neural networks
Ali I. Siam
Atef Abou Elazm
Nirmeen A. El-Bahnasawy
Ghada M. El Banby
Fathi E. Abd El-Samie
Multimedia Tools and Applications, 2021, 80 : 26001 - 26019
[40] Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures
Darch, Jonathan
Milner, Ben
Vaseghi, Saeed
Journal of the Acoustical Society of America, 2009, 124 (06): : 3989 - 4000

← 1 2 3 4 5 →