TWACapsNet: a capsule network with two-way attention mechanism for speech emotion recognition

被引：0

作者：

Wen, Xin-Cheng ^{[1
]}

Liu, Kun-Hong ^{[2
]}

Luo, Yan ^{[3
]}

Ye, Jiaxin ^{[4
]}

Chen, Liyan ^{[2
]}

机构：

[1] Harbin Inst Technol Shenzhen, Dept Comp Sci, Shenzhen, Peoples R China

[2] Xiamen Univ, Sch Film, Xiamen, Peoples R China

[3] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China

[4] Fudan Univ, Inst Sci & Technol Brain inspired Intelligence, Shanghai, Peoples R China

来源：

SOFT COMPUTING | 2023年 / 28卷 / 15-16期

基金：

中国国家自然科学基金;

关键词：

Speech emotion recognition; Attention mechanism; Neural networks; FEATURES; CNN;

D O I：

10.1007/s00500-023-08957-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech Emotion Recognition (SER) is a challenging task, and the typical convolutional neural network (CNN) cannot well handle the speech data directly. Because CNN tends to understand local information and ignores the overall characteristics. This paper proposes a Capsule Network with Two-Way Attention MechanismTWACapsNet for short) for the SER problem. TWACapsNet accepts the spatial and spectral features as inputs, and the convolutional layer and the capsule layer are deployed to process these two types of features in two ways separately. After that, two attention mechanisms are designed to enhance the information obtained from the spatial and spectral features. Finally, the results of these two ways are combined to form the final decision. The advantage of TWACapsNet is verified by experiments on multiple SER data sets, and experimental results show that the proposed method outperforms the widely-deployed neural network models on three typical SER data sets. Furthermore, the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.

引用

页码：8701 / 8713

页数：13

共 50 条

[1] Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning
Aggarwal, Apeksha
Srivastava, Akshat
Agarwal, Ajay
Chahal, Nidhi
Singh, Dilbag
Alnuaim, Abeer Ali
Alhadlaq, Aseel
Lee, Heung-No
SENSORS, 2022, 22 (06)
[2] Sparse Autoencoder with Attention Mechanism for Speech Emotion Recognition
Sun, Ting-Wei
Wu, An-Yeu
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 146 - 149
[3] A Speech Emotion Recognition Method Based on Lightweight Capsule Network
Wang Y.
Gao S.
Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2023, 52 (03): : 423 - 429
[4] DEEP CONVOLUTIONAL RECURRENT NEURAL NETWORK WITH ATTENTION MECHANISM FOR ROBUST SPEECH EMOTION RECOGNITION
Huang, Che-Wei
Narayanan, Shrikanth
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 583 - 588
[5] EEG emotion recognition based on the attention mechanism and pre-trained convolution capsule network
Liu, Shuaiqi
Wang, Zeyao
An, Yanling
Zhao, Jie
Zhao, Yingying
Zhang, Yu-Dong
KNOWLEDGE-BASED SYSTEMS, 2023, 265
[6] Emotion recognition based on 3D matrices and two-way densely connected network
Li, Hongli
Du, Congcong
Liu, Zhuocheng
Liu, Jiahao
Liu, Haoyu
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (12) : 8987 - 8997
[7] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
Ramet, Gaetan
Garner, Philip N.
Baeriswyl, Michael
Lazaridis, Alexandros
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
[8] Speech emotion recognition with embedded attention mechanism and hierarchical context
Cheng Y.
Chen Y.
Chen Y.
Yang Y.
Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2019, 51 (11): : 100 - 107
[9] EFFECTIVE ATTENTION MECHANISM IN DYNAMIC MODELS FOR SPEECH EMOTION RECOGNITION
Hsiao, Po-Wei
Chen, Chia-Ping
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2526 - 2530
[10] Speech emotion recognition algorithm based on the selective attention mechanism
2016, Science Press (41):

← 1 2 3 4 5 →