Hybrid CNN-BiLSTM architecture with multiple attention mechanisms to enhance speech emotion recognition

被引:0
|
作者
Poorna, S. S. [1 ]
Menon, Vivek [2 ]
Gopalan, Sundararaman [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Dept Elect & Commun Engn, Amritapuri, India
[2] Amrita Vishwa Vidyapeetham, Dept Comp Sci & Engn, Amrita Sch Comp, Amritapuri, India
关键词
SER; CNN; BiLSTM; Mel spectrograms; MFCC; Time-frequency attention; CONVOLUTIONAL NEURAL-NETWORKS; 2D CNN; FEATURES; RECURRENT; REPRESENTATIONS; DATABASES; MODEL;
D O I
10.1016/j.bspc.2024.106967
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
During recent years, the concept of attention in deep learning has been increasingly used to boost formance of Speech Emotion Recognition (SER) models. However, these models for SER exhibit shortcomings in jointly emphasizing the time-frequency and dynamic sequential variations, often under-utilizing contextual emotion-related information. We propose a hybrid deep learning model for SER using Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory Networks (BiLSTM) with multiple attention mechanisms. Our model utilizes features from the speech waveform viz. Mel spectrograms and Mel Frequency Cepstral Coefficients (MFCC), along with their time derivatives as input to the CNN and BiLSTM modules, respectively. A Time-Frequency Attention (TFA) mechanism, optimally incorporated into CNN, helps selectively focus on emotion-related energy-time-frequency variations in Mel spectrograms. Attention BiLSTM uses MFCC and its time derivatives to identify the positional information of emotion for addressing the dynamic sequential variations. Finally, we fuse the attention-learned features from the CNN and modules and feed them to a Deep Neural Network (DNN) for SER. The experiments were carried out three different datasets: Emo-DB and IEMOCAP, which are public datasets, and Amritaemo_Arabic; a dataset. The hybrid model exhibited superior performance on both the public and private datasets, generating an average SER accuracy of 94.62%, 67.85%, and 95.80% with Emo-DB, IEMOCAP, and Amritaemo_Arabic datasets, respectively, effectively outperforming several state-of-the-art models.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Music Audio Sentiment Classification Based on CNN-BiLSTM and Attention Model
    Chen Zhen
    Liu Changhui
    2021 4TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION ENGINEERING (RCAE 2021), 2021, : 156 - 160
  • [32] Research on EEG emotion recognition based on CNN+BiLSTM+self-attention model
    LI Xueqing
    LI Penghai
    FANG Zhendong
    CHENG Longlong
    WANG Zhiyong
    WANG Weijie
    Optoelectronics Letters, 2023, 19 (08) : 506 - 512
  • [33] Multilabel Emotion Detection from Bangla Text Using BiGRU and CNN-BiLSTM
    Rayhan, Md Maruf
    Al Musabe, Taif
    Islam, Md Arafatul
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [34] Text Recovery via Deep CNN-BiLSTM Recognition and Bayesian Inference
    Jiao, Libin
    Wu, Hao
    Wang, Haodi
    Bie, Rongfang
    IEEE ACCESS, 2018, 6 : 76416 - 76428
  • [35] Sign Language Recognition Based on CNN-BiLSTM Using RF Signals
    Zhang, Yajun
    Wang, Yuankang
    Li, Feng
    Yu, Weiqian
    Wang, Congcong
    Jiang, Ying
    IEEE ACCESS, 2024, 12 : 190487 - 190504
  • [36] Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition
    Jiang, Pengxu
    Xu, Xinzhou
    Tao, Huawei
    Zhao, Li
    Zou, Cairong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (04) : 1564 - 1573
  • [37] CNN-BiLSTM Hybrid Model for Network Anomaly Detection in Internet of Things
    Omarov, Bauyrzhan
    Auelbekov, Omirlan
    Suliman, Azizah
    Zhaxanova, Ainur
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (03) : 436 - 444
  • [38] An Encrypted Speech Retrieval Method Based on Deep Perceptual Hashing and CNN-BiLSTM
    Zhang, Qiuyu
    Li, Yuzhou
    Hu, Yingjie
    Zhao, Xuejiao
    IEEE ACCESS, 2020, 8 : 148556 - 148569
  • [39] An Intrusion Detection Method Based on Attention Mechanism to Improve CNN-BiLSTM Model
    Shou, Dingyu
    Li, Chao
    Wang, Zhen
    Cheng, Song
    Hu, Xiaobo
    Zhang, Kai
    Wen, Mi
    Wang, Yong
    COMPUTER JOURNAL, 2023, 67 (05): : 1851 - 1865
  • [40] Research on CNN-BiLSTM Fall Detection Algorithm Based on Improved Attention Mechanism
    Li, Congcong
    Liu, Minghao
    Yan, Xinsheng
    Teng, Guifa
    APPLIED SCIENCES-BASEL, 2022, 12 (19):