Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition

被引：5

作者：

Liu, Yang ^{[1
]}

Sun, Haoqin ^{[1
]}

Guan, Wenbo ^{[1
]}

Xia, Yuqi ^{[1
]}

Zhao, Zhen ^{[1
]}

机构：

[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266061, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

Speech Emotion Recognition; Three-channel Features; Cascaded Attention Network; Adversarial Joint Loss;

D O I：

10.21437/Interspeech.2022-11480

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Accurately recognizing emotion from speech is a necessary yet challenging task due to its complexity. A common problem existing in most of the previous studies is that some of the particular emotions are severely misclassified. In this paper, we propose a novel framework integrating cascaded attention and adversarial joint loss for speech emotion recognition, aiming at discriminating the confusions by emphasizing more on the emotions which are difficult to be correctly classified. Specifically, we propose a cascaded attention network to extract effective emotional features, where spatiotemporal attention selectively locates the targeted emotional regions from the input features. In these targeted regions, the self-attention with head fusion captures the long-distance dependence of temporal features. Furthermore, an adversarial joint loss strategy is proposed to distinguish the emotional embeddings with high similarity by the generated hard triplets in an adversarial fashion. Experimental results on the benchmark dataset IEMOCAP demonstrate that our method gains an absolute improvement of 3.17% and 0.39% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.

引用

页码：4750 / 4754

页数：5

共 50 条

[1] A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Li, Yongwei
Unoki, Masashi
Zhao, Zhen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1063 - 1074
[2] Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition
Liu, Yang
Chen, Xin
Song, Yuan
Li, Yarong
Wang, Shengbei
Yuan, Weitao
Li, Yongwei
Zhao, Zhen
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[3] Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
MACHINE INTELLIGENCE RESEARCH, 2023, 20 (04) : 595 - 604
[4] Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions
Yang Liu
Haoqin Sun
Wenbo Guan
Yuqi Xia
Zhen Zhao
Machine Intelligence Research, 2023, 20 : 595 - 604
[5] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
Hu, Ying
Hou, Shijing
Yang, Huamin
Huang, Hao
He, Liang
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
[6] A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition
Liu, Yang
Xia, Yuqi
Sun, Haoqin
Meng, Xiaolei
Bai, Jianxiong
Guan, Wenbo
Zhao, Zhen
LI, Yongwei
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2023, E106A (06) : 876 - 885
[7] Speech Emotion Recognition with Discriminative Feature Learning
Zhou, Huan
Liu, Kai
INTERSPEECH 2020, 2020, : 4094 - 4097
[8] Discriminative Feature Learning for Speech Emotion Recognition
Zhang, Yuying
Zou, Yuexian
Peng, Junyi
Luo, Danqing
Huang, Dongyan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 198 - 210
[9] DOMAIN-ADVERSARIAL AUTOENCODER WITH ATTENTION BASED FEATURE LEVEL FUSION FOR SPEECH EMOTION RECOGNITION
Gao, Yuan
Liu, JiaXing
Wang, Longbiao
Dang, Jianwu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6314 - 6318
[10] Feature representation for speech emotion Recognition
Abdollahpour, Mehdi
Zamani, Lafar
Rad, Hamidreza Saligheh
2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1465 - 1468

← 1 2 3 4 5 →