Variational mode decomposition based acoustic and entropy features for speech emotion recognition

被引:18
|
作者
Mishra, Siba Prasad [1 ]
Warule, Pankaj [1 ]
Deb, Suman [1 ]
机构
[1] Sardar Vallabhbhai Natl Inst Technol, Surat, Gujarat, India
关键词
Deep neural network; Speech emotion recognition; MFCC; Permutation entropy; Approximate entropy; APPROXIMATE ENTROPY; FEATURE-EXTRACTION; CLASSIFICATION; DEEP;
D O I
10.1016/j.apacoust.2023.109578
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automated speech emotion recognition (SER) is a machine-based method for identifying emotion from speech signals. SER has many practical applications, including improving man-machine interaction (MMI), online customer support, healthcare services, online marketing, etc. Because of the wide range of applications, the popularity of SER has been increasing among researchers for three decades. Numerous studies employed various combinations of features and classifiers to improve emotion classification performance. In our study, we tried to achieve the same by using variational mode decomposition (VMD)-based features. We extracted features like MFCC, mel-spectrogram, approximate entropy (ApEn), and permutation entropy (PrEn) from each VMD mode. The performance of emotion classification is evaluated using the deep neural network (DNN) classifier and the proposed VMD-based features individually (MFCC, mel-spectrogram, ApEn, and PrEn) and in combination (MFCC + mel-spectrogram + ApEn + PrEn). We used two datasets, RAVDESS and EMO-DB, to evaluate the emotion classification performance and obtained a classification accuracy of 91.59% and 80.83% for the EMO-DB and RAVDESS datasets, respectively. Our experimental results were compared with the other methods, and we found that the proposed VMD-based feature combinations with a DNN classifier performed better than the state-of-the-art works in SER.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Recognition of denatured biological tissue based on variational mode decomposition and multi-scale permutation entropy
    Liu Bei
    Hu Wei-Peng
    Zou Xiao
    Ding Ya-Jun
    Qian Sheng-You
    ACTA PHYSICA SINICA, 2019, 68 (02)
  • [42] Machine learning techniques for speech emotion recognition using paralinguistic acoustic features
    Jha T.
    Kavya R.
    Christopher J.
    Arunachalam V.
    International Journal of Speech Technology, 2022, 25 (03): : 707 - 725
  • [43] Gradient-Based Acoustic Features for Speech Recognition
    Muroi, Takashi
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2009), 2009, : 445 - 448
  • [44] Speech emotion recognition based on genetic algorithm-decision tree fusion of deep and acoustic features
    Sun, Linhui
    Li, Qiu
    Fu, Sheng
    Li, Pingan
    ETRI JOURNAL, 2022, 44 (03) : 462 - 475
  • [45] Speech Emotion Recognition Using Spectral Entropy
    Lee, Woo-Seok
    Roh, Yong-Wan
    Kim, Dong-Ju
    Kim, Jung-Hyun
    Hong, Kwang-Seok
    INTELLIGENT ROBOTICS AND APPLICATIONS, PT II, PROCEEDINGS, 2008, 5315 : 45 - 54
  • [46] Acoustic features extraction for emotion recognition
    Rong, Jia
    Chen, Yi-Ping Phoebe
    Chowdhury, Morshed
    Li, Gang
    6TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, PROCEEDINGS, 2007, : 419 - +
  • [47] Speech emotion recognition based on prosodic segment level features
    Han, Wenjing
    Li, Haifeng
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (SUPPL. 1): : 1363 - 1368
  • [48] Investigating Graph-based Features for Speech Emotion Recognition
    Pentari, Anastasia
    Kafentzis, George
    Tsiknakis, Manolis
    2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,
  • [49] Automatic speech based emotion recognition using paralinguistics features
    Hook, J.
    Noroozi, F.
    Toygar, O.
    Anbarjafari, G.
    BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2019, 67 (03) : 479 - 488
  • [50] NMF-based Cepstral Features for Speech Emotion Recognition
    Lashkari, Milad
    Seyedin, Sanaz
    2018 4TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2018, : 189 - 193