Hybrid CNN-BiLSTM architecture with multiple attention mechanisms to enhance speech emotion recognition

被引:0
|
作者
Poorna, S. S. [1 ]
Menon, Vivek [2 ]
Gopalan, Sundararaman [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Dept Elect & Commun Engn, Amritapuri, India
[2] Amrita Vishwa Vidyapeetham, Dept Comp Sci & Engn, Amrita Sch Comp, Amritapuri, India
关键词
SER; CNN; BiLSTM; Mel spectrograms; MFCC; Time-frequency attention; CONVOLUTIONAL NEURAL-NETWORKS; 2D CNN; FEATURES; RECURRENT; REPRESENTATIONS; DATABASES; MODEL;
D O I
10.1016/j.bspc.2024.106967
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
During recent years, the concept of attention in deep learning has been increasingly used to boost formance of Speech Emotion Recognition (SER) models. However, these models for SER exhibit shortcomings in jointly emphasizing the time-frequency and dynamic sequential variations, often under-utilizing contextual emotion-related information. We propose a hybrid deep learning model for SER using Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory Networks (BiLSTM) with multiple attention mechanisms. Our model utilizes features from the speech waveform viz. Mel spectrograms and Mel Frequency Cepstral Coefficients (MFCC), along with their time derivatives as input to the CNN and BiLSTM modules, respectively. A Time-Frequency Attention (TFA) mechanism, optimally incorporated into CNN, helps selectively focus on emotion-related energy-time-frequency variations in Mel spectrograms. Attention BiLSTM uses MFCC and its time derivatives to identify the positional information of emotion for addressing the dynamic sequential variations. Finally, we fuse the attention-learned features from the CNN and modules and feed them to a Deep Neural Network (DNN) for SER. The experiments were carried out three different datasets: Emo-DB and IEMOCAP, which are public datasets, and Amritaemo_Arabic; a dataset. The hybrid model exhibited superior performance on both the public and private datasets, generating an average SER accuracy of 94.62%, 67.85%, and 95.80% with Emo-DB, IEMOCAP, and Amritaemo_Arabic datasets, respectively, effectively outperforming several state-of-the-art models.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Life Prediction for Machinery Components Based on CNN-BiLSTM Network and Attention Model
    Wang, Mengyong
    Cheng, Jian
    Zhai, Hongyu
    PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 851 - 855
  • [42] A Prediction Method of Consumer Buying Behavior Based on Attention Mechanism and CNN-BiLSTM
    Wang, Jian-Nan
    Cui, Jian-Feng
    Chen, Chin-Ling
    Journal of Network Intelligence, 2022, 7 (02): : 375 - 385
  • [43] PM2.5 Concentration Prediction Based on CNN-BiLSTM and Attention Mechanism
    Zhang, Jinsong
    Peng, Yongtao
    Ren, Bo
    Li, Taoying
    ALGORITHMS, 2021, 14 (07)
  • [44] Low-orbit satellite channel prediction algorithm based on the hybrid CNN-BiLSTM using attention mechanism
    Tang, Yiqiang
    Yang, Xiaopeng
    Zhu, Shengming
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2022, 44 (12): : 3863 - 3870
  • [45] CBMAFM: CNN-BiLSTM Multi-Attention Fusion Mechanism for sentiment classification
    Mayur Wankhade
    Chandra Sekhara Rao Annavarapu
    Ajith Abraham
    Multimedia Tools and Applications, 2024, 83 : 51755 - 51786
  • [46] CBMAFM: CNN-BiLSTM Multi-Attention Fusion Mechanism for sentiment classification
    Wankhade, Mayur
    Annavarapu, Chandra Sekhara Rao
    Abraham, Ajith
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) : 51755 - 51786
  • [47] Chinese News Text Classification based on Attention-based CNN-BiLSTM
    Wang, Meng
    Cai, Qiong
    Wang, Liya
    Li, Jun
    Wang, Xiaoke
    MIPPR 2019: PATTERN RECOGNITION AND COMPUTER VISION, 2020, 11430
  • [48] Siamese CNN-BiLSTM Architecture for 3D Shape Representation Learning
    Dai, Guoxian
    Xie, Jin
    Fang, Yi
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 670 - 676
  • [49] CNN-AttBiLSTM Mechanism: A DDoS Attack Detection Method Based on Attention Mechanism and CNN-BiLSTM
    Zhao, Junjie
    Liu, Yongmin
    Zhang, Qianlei
    Zheng, Xinying
    IEEE ACCESS, 2023, 11 : 136308 - 136317
  • [50] Hybrid Time Distributed CNN-transformer for Speech Emotion Recognition
    Slimi, Anwer
    Nicolas, Henri
    Zrigui, Mounir
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES (ICSOFT), 2022, : 602 - 611