TWACapsNet: a capsule network with two-way attention mechanism for speech emotion recognition

被引:0
|
作者
Wen, Xin-Cheng [1 ]
Liu, Kun-Hong [2 ]
Luo, Yan [3 ]
Ye, Jiaxin [4 ]
Chen, Liyan [2 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci, Shenzhen, Peoples R China
[2] Xiamen Univ, Sch Film, Xiamen, Peoples R China
[3] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
[4] Fudan Univ, Inst Sci & Technol Brain inspired Intelligence, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Attention mechanism; Neural networks; FEATURES; CNN;
D O I
10.1007/s00500-023-08957-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech Emotion Recognition (SER) is a challenging task, and the typical convolutional neural network (CNN) cannot well handle the speech data directly. Because CNN tends to understand local information and ignores the overall characteristics. This paper proposes a Capsule Network with Two-Way Attention MechanismTWACapsNet for short) for the SER problem. TWACapsNet accepts the spatial and spectral features as inputs, and the convolutional layer and the capsule layer are deployed to process these two types of features in two ways separately. After that, two attention mechanisms are designed to enhance the information obtained from the spatial and spectral features. Finally, the results of these two ways are combined to form the final decision. The advantage of TWACapsNet is verified by experiments on multiple SER data sets, and experimental results show that the proposed method outperforms the widely-deployed neural network models on three typical SER data sets. Furthermore, the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.
引用
收藏
页码:8701 / 8713
页数:13
相关论文
共 50 条
  • [31] Improved ShuffleNet V2 network with attention for speech emotion recognition
    Udeh, Chinonso Paschal
    Chen, Luefeng
    Du, Sheng
    Liu, Yulong
    Li, Min
    Wu, Min
    INFORMATION SCIENCES, 2025, 689
  • [32] AN INTERACTION-AWARE ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION IN SPOKEN DIALOGS
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6685 - 6689
  • [33] Speech Emotion Recognition Using Sequential Capsule Networks
    Wu, Xixin
    Cao, Yuewen
    Lu, Hui
    Liu, Songxiang
    Wang, Disong
    Wu, Zhiyong
    Liu, Xunying
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3280 - 3291
  • [34] A novel conversational hierarchical attention network for speech emotion recognition in dyadic conversation
    Tellai, Mohammed
    Gao, Lijian
    Mao, Qirong
    Abdelaziz, Mounir
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59699 - 59723
  • [35] MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION
    Sun, Licai
    Liu, Bin
    Tao, Jianhua
    Lian, Zheng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4275 - 4279
  • [36] The Impact of Attention Mechanisms on Speech Emotion Recognition
    Chen, Shouyan
    Zhang, Mingyan
    Yang, Xiaofen
    Zhao, Zhijia
    Zou, Tao
    Sun, Xinqi
    SENSORS, 2021, 21 (22)
  • [37] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582
  • [38] Emotion Recognition via Multiscale Feature Fusion Network and Attention Mechanism
    Jiang, Yiye
    Xie, Songyun
    Xie, Xinzhou
    Cui, Yujie
    Tang, Hao
    IEEE SENSORS JOURNAL, 2023, 23 (10) : 10790 - 10800
  • [39] A two-way window on face recognition
    Breen, N
    Coltheart, M
    Caine, D
    TRENDS IN COGNITIVE SCIENCES, 2001, 5 (06) : 234 - 235
  • [40] PulseEmoNet: Pulse emotion network for speech emotion recognition
    Zhang, Huiyun
    Tang, Gaigai
    Huang, Heming
    Yuan, Zhu
    Li, Zongjin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 105