TWACapsNet: a capsule network with two-way attention mechanism for speech emotion recognition

被引:0
|
作者
Wen, Xin-Cheng [1 ]
Liu, Kun-Hong [2 ]
Luo, Yan [3 ]
Ye, Jiaxin [4 ]
Chen, Liyan [2 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci, Shenzhen, Peoples R China
[2] Xiamen Univ, Sch Film, Xiamen, Peoples R China
[3] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
[4] Fudan Univ, Inst Sci & Technol Brain inspired Intelligence, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Attention mechanism; Neural networks; FEATURES; CNN;
D O I
10.1007/s00500-023-08957-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech Emotion Recognition (SER) is a challenging task, and the typical convolutional neural network (CNN) cannot well handle the speech data directly. Because CNN tends to understand local information and ignores the overall characteristics. This paper proposes a Capsule Network with Two-Way Attention MechanismTWACapsNet for short) for the SER problem. TWACapsNet accepts the spatial and spectral features as inputs, and the convolutional layer and the capsule layer are deployed to process these two types of features in two ways separately. After that, two attention mechanisms are designed to enhance the information obtained from the spatial and spectral features. Finally, the results of these two ways are combined to form the final decision. The advantage of TWACapsNet is verified by experiments on multiple SER data sets, and experimental results show that the proposed method outperforms the widely-deployed neural network models on three typical SER data sets. Furthermore, the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.
引用
收藏
页码:8701 / 8713
页数:13
相关论文
共 50 条
  • [21] EEG emotion recognition based on efficient-capsule network with convolutional attention
    Tang, Wei
    Fan, Linhui
    Lin, Xuefen
    Gu, Yifan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 103
  • [22] Speech Emotion Recognition via Multi-Level Attention Network
    Liu, Ke
    Wang, Dekui
    Wu, Dongya
    Liu, Yutao
    Feng, Jun
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2278 - 2282
  • [23] Temporal Attention Convolutional Network for Speech Emotion Recognition with Latent Representation
    Liu, Jiaxing
    Liu, Zhilei
    Wang, Longbiao
    Gao, Yuan
    Guo, Lili
    Dang, Jianwu
    INTERSPEECH 2020, 2020, : 2337 - 2341
  • [24] Attention gated tensor neural network architectures for speech emotion recognition
    Pandey, Sandeep Kumar
    Shekhawat, Hanumant Singh
    Prasanna, S. R. M.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 71
  • [25] TA-CNN: Two-way attention models in deep convolutional neural network for plant recognition
    Zhu, Youxiang
    Sun, Weiming
    Cao, Xiangying
    Wang, Chunyan
    Wu, Dongyang
    Yang, Yin
    Ye, Ning
    NEUROCOMPUTING, 2019, 365 : 191 - 200
  • [26] Emotion Recognition with Capsule Neural Network
    Loan Trinh Van
    Quang H Nguyen
    Thuy Dao Thi Le
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2022, 41 (03): : 1083 - 1098
  • [27] Attention gate between capsules in fully capsule-network speech recognition
    Lee, Kyungmin
    Lim, Hyeontaek
    Lee, Mun-Hwan
    Kim, Hong-Gee
    INTERSPEECH 2023, 2023, : 874 - 878
  • [28] Two-Way Neural Network Chinese-English Machine Translation Model Fused with Attention Mechanism
    Liang, Jing
    Du, Minghui
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [29] Student Performance Prediction Model Based on Two-Way Attention Mechanism
    Li M.
    Wang X.
    Ruan S.
    Zhang K.
    Liu Q.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2020, 57 (08): : 1729 - 1740
  • [30] A speech emotion recognition method for the elderly based on feature fusion and attention mechanism
    Jian, Qijian
    Xiang, Min
    Huang, Wei
    THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167