Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition

被引：5

作者：

Liu, Yang ^{[1
]}

Sun, Haoqin ^{[1
]}

Guan, Wenbo ^{[1
]}

Xia, Yuqi ^{[1
]}

Zhao, Zhen ^{[1
]}

机构：

[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266061, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

Speech Emotion Recognition; Three-channel Features; Cascaded Attention Network; Adversarial Joint Loss;

D O I：

10.21437/Interspeech.2022-11480

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Accurately recognizing emotion from speech is a necessary yet challenging task due to its complexity. A common problem existing in most of the previous studies is that some of the particular emotions are severely misclassified. In this paper, we propose a novel framework integrating cascaded attention and adversarial joint loss for speech emotion recognition, aiming at discriminating the confusions by emphasizing more on the emotions which are difficult to be correctly classified. Specifically, we propose a cascaded attention network to extract effective emotional features, where spatiotemporal attention selectively locates the targeted emotional regions from the input features. In these targeted regions, the self-attention with head fusion captures the long-distance dependence of temporal features. Furthermore, an adversarial joint loss strategy is proposed to distinguish the emotional embeddings with high similarity by the generated hard triplets in an adversarial fashion. Experimental results on the benchmark dataset IEMOCAP demonstrate that our method gains an absolute improvement of 3.17% and 0.39% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.

引用

页码：4750 / 4754

页数：5

共 50 条

[21] Speech Emotion Recognition Model Based on Joint Modeling of Discrete and Dimensional Emotion Representation
Bautista, John Lorenzo
Shin, Hyun Soon
APPLIED SCIENCES-BASEL, 2025, 15 (02):
[22] Adversarial Data Augmentation Network for Speech Emotion Recognition
Yi, Lu
Mak, Man-Wai
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
[23] Transferable driver facial expression recognition based on joint discriminative correlation alignment network with enhanced feature attention
Chen, Xiaobo
Du, Jian
Deng, Fuwen
Zhao, Feng
IET INTELLIGENT TRANSPORT SYSTEMS, 2023, 17 (12) : 2444 - 2457
[24] A speech emotion recognition method for the elderly based on feature fusion and attention mechanism
Jian, Qijian
Xiang, Min
Huang, Wei
THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167
[25] A bimodal network based on Audio-Text-Interactional-Attention with ArcFace loss for speech emotion recognition
Tang, Yuwu
Hu, Ying
He, Liang
Huang, Hao
SPEECH COMMUNICATION, 2022, 143 : 21 - 32
[26] Joint Bottleneck Feature and Attention Model for Speech Recognition
Long Xingyan
Qu Dan
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2018), 2018, : 46 - 50
[27] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
Yi, Lu
Mak, Man-Wai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184
[28] Speech emotion classification using attention based network and regularized feature selection
Samson Akinpelu
Serestina Viriri
Scientific Reports, 13
[29] Speech emotion classification using attention based network and regularized feature selection
Akinpelu, Samson
Viriri, Serestina
SCIENTIFIC REPORTS, 2023, 13 (01)
[30] MSDSANet: Multimodal Emotion Recognition Based on Multi-Stream Network and Dual-Scale Attention Network Feature Representation
Sun, Weitong
Yan, Xingya
Su, Yuping
Wang, Gaihua
Zhang, Yumei
SENSORS, 2025, 25 (07)

← 1 2 3 4 5 →