TWACapsNet: a capsule network with two-way attention mechanism for speech emotion recognition

被引:0
|
作者
Wen, Xin-Cheng [1 ]
Liu, Kun-Hong [2 ]
Luo, Yan [3 ]
Ye, Jiaxin [4 ]
Chen, Liyan [2 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci, Shenzhen, Peoples R China
[2] Xiamen Univ, Sch Film, Xiamen, Peoples R China
[3] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
[4] Fudan Univ, Inst Sci & Technol Brain inspired Intelligence, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Attention mechanism; Neural networks; FEATURES; CNN;
D O I
10.1007/s00500-023-08957-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech Emotion Recognition (SER) is a challenging task, and the typical convolutional neural network (CNN) cannot well handle the speech data directly. Because CNN tends to understand local information and ignores the overall characteristics. This paper proposes a Capsule Network with Two-Way Attention MechanismTWACapsNet for short) for the SER problem. TWACapsNet accepts the spatial and spectral features as inputs, and the convolutional layer and the capsule layer are deployed to process these two types of features in two ways separately. After that, two attention mechanisms are designed to enhance the information obtained from the spatial and spectral features. Finally, the results of these two ways are combined to form the final decision. The advantage of TWACapsNet is verified by experiments on multiple SER data sets, and experimental results show that the proposed method outperforms the widely-deployed neural network models on three typical SER data sets. Furthermore, the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.
引用
收藏
页码:8701 / 8713
页数:13
相关论文
共 50 条
  • [1] Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
    Aggarwal, Apeksha
    Srivastava, Akshat
    Agarwal, Ajay
    Chahal, Nidhi
    Singh, Dilbag
    Alnuaim, Abeer Ali
    Alhadlaq, Aseel
    Lee, Heung-No
    SENSORS, 2022, 22 (06)
  • [2] Sparse Autoencoder with Attention Mechanism for Speech Emotion Recognition
    Sun, Ting-Wei
    Wu, An-Yeu
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 146 - 149
  • [3] A Speech Emotion Recognition Method Based on Lightweight Capsule Network
    Wang Y.
    Gao S.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2023, 52 (03): : 423 - 429
  • [4] DEEP CONVOLUTIONAL RECURRENT NEURAL NETWORK WITH ATTENTION MECHANISM FOR ROBUST SPEECH EMOTION RECOGNITION
    Huang, Che-Wei
    Narayanan, Shrikanth
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 583 - 588
  • [5] EEG emotion recognition based on the attention mechanism and pre-trained convolution capsule network
    Liu, Shuaiqi
    Wang, Zeyao
    An, Yanling
    Zhao, Jie
    Zhao, Yingying
    Zhang, Yu-Dong
    KNOWLEDGE-BASED SYSTEMS, 2023, 265
  • [6] Emotion recognition based on 3D matrices and two-way densely connected network
    Li, Hongli
    Du, Congcong
    Liu, Zhuocheng
    Liu, Jiahao
    Liu, Haoyu
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (12) : 8987 - 8997
  • [7] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
    Ramet, Gaetan
    Garner, Philip N.
    Baeriswyl, Michael
    Lazaridis, Alexandros
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
  • [8] Speech emotion recognition with embedded attention mechanism and hierarchical context
    Cheng Y.
    Chen Y.
    Chen Y.
    Yang Y.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2019, 51 (11): : 100 - 107
  • [9] EFFECTIVE ATTENTION MECHANISM IN DYNAMIC MODELS FOR SPEECH EMOTION RECOGNITION
    Hsiao, Po-Wei
    Chen, Chia-Ping
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2526 - 2530