Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition

被引:5
|
作者
Liu, Yang [1 ]
Sun, Haoqin [1 ]
Guan, Wenbo [1 ]
Xia, Yuqi [1 ]
Zhao, Zhen [1 ]
机构
[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266061, Peoples R China
来源
INTERSPEECH 2022 | 2022年
关键词
Speech Emotion Recognition; Three-channel Features; Cascaded Attention Network; Adversarial Joint Loss;
D O I
10.21437/Interspeech.2022-11480
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accurately recognizing emotion from speech is a necessary yet challenging task due to its complexity. A common problem existing in most of the previous studies is that some of the particular emotions are severely misclassified. In this paper, we propose a novel framework integrating cascaded attention and adversarial joint loss for speech emotion recognition, aiming at discriminating the confusions by emphasizing more on the emotions which are difficult to be correctly classified. Specifically, we propose a cascaded attention network to extract effective emotional features, where spatiotemporal attention selectively locates the targeted emotional regions from the input features. In these targeted regions, the self-attention with head fusion captures the long-distance dependence of temporal features. Furthermore, an adversarial joint loss strategy is proposed to distinguish the emotional embeddings with high similarity by the generated hard triplets in an adversarial fashion. Experimental results on the benchmark dataset IEMOCAP demonstrate that our method gains an absolute improvement of 3.17% and 0.39% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.
引用
收藏
页码:4750 / 4754
页数:5
相关论文
共 50 条
  • [21] Speech Emotion Recognition Model Based on Joint Modeling of Discrete and Dimensional Emotion Representation
    Bautista, John Lorenzo
    Shin, Hyun Soon
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [22] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
  • [23] Transferable driver facial expression recognition based on joint discriminative correlation alignment network with enhanced feature attention
    Chen, Xiaobo
    Du, Jian
    Deng, Fuwen
    Zhao, Feng
    IET INTELLIGENT TRANSPORT SYSTEMS, 2023, 17 (12) : 2444 - 2457
  • [24] A speech emotion recognition method for the elderly based on feature fusion and attention mechanism
    Jian, Qijian
    Xiang, Min
    Huang, Wei
    THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167
  • [25] A bimodal network based on Audio-Text-Interactional-Attention with ArcFace loss for speech emotion recognition
    Tang, Yuwu
    Hu, Ying
    He, Liang
    Huang, Hao
    SPEECH COMMUNICATION, 2022, 143 : 21 - 32
  • [26] Joint Bottleneck Feature and Attention Model for Speech Recognition
    Long Xingyan
    Qu Dan
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2018), 2018, : 46 - 50
  • [27] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
    Yi, Lu
    Mak, Man-Wai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184
  • [28] Speech emotion classification using attention based network and regularized feature selection
    Samson Akinpelu
    Serestina Viriri
    Scientific Reports, 13
  • [29] Speech emotion classification using attention based network and regularized feature selection
    Akinpelu, Samson
    Viriri, Serestina
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [30] MSDSANet: Multimodal Emotion Recognition Based on Multi-Stream Network and Dual-Scale Attention Network Feature Representation
    Sun, Weitong
    Yan, Xingya
    Su, Yuping
    Wang, Gaihua
    Zhang, Yumei
    SENSORS, 2025, 25 (07)