Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition

被引:5
|
作者
Liu, Yang [1 ]
Sun, Haoqin [1 ]
Guan, Wenbo [1 ]
Xia, Yuqi [1 ]
Zhao, Zhen [1 ]
机构
[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266061, Peoples R China
来源
INTERSPEECH 2022 | 2022年
关键词
Speech Emotion Recognition; Three-channel Features; Cascaded Attention Network; Adversarial Joint Loss;
D O I
10.21437/Interspeech.2022-11480
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accurately recognizing emotion from speech is a necessary yet challenging task due to its complexity. A common problem existing in most of the previous studies is that some of the particular emotions are severely misclassified. In this paper, we propose a novel framework integrating cascaded attention and adversarial joint loss for speech emotion recognition, aiming at discriminating the confusions by emphasizing more on the emotions which are difficult to be correctly classified. Specifically, we propose a cascaded attention network to extract effective emotional features, where spatiotemporal attention selectively locates the targeted emotional regions from the input features. In these targeted regions, the self-attention with head fusion captures the long-distance dependence of temporal features. Furthermore, an adversarial joint loss strategy is proposed to distinguish the emotional embeddings with high similarity by the generated hard triplets in an adversarial fashion. Experimental results on the benchmark dataset IEMOCAP demonstrate that our method gains an absolute improvement of 3.17% and 0.39% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.
引用
收藏
页码:4750 / 4754
页数:5
相关论文
共 50 条
  • [31] Speech Emotion Recognition Based on Sparse Representation
    Yan, Jingjie
    Wang, Xiaolan
    Gu, Weiyi
    Ma, Lili
    ARCHIVES OF ACOUSTICS, 2013, 38 (04) : 465 - 470
  • [32] High-level Feature Representation using Recurrent Neural Network for Speech Emotion Recognition
    Lee, Jinkyu
    Tashev, Ivan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1537 - 1540
  • [33] Speech Emotion Recognition Based on Feature Fusion
    Shen, Qi
    Chen, Guanggen
    Chang, Lin
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 1071 - 1074
  • [34] Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition
    Zhao, Huan
    Gao, Yingxue
    Xiao, Yufeng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 118 - 130
  • [35] Metric Learning Based Feature Representation with Gated Fusion Model for Speech Emotion Recognition
    Gao, Yuan
    Liu, JiaXing
    Wang, Longbiao
    Dang, Jianwu
    INTERSPEECH 2021, 2021, : 4503 - 4507
  • [36] A Parallel-Model Speech Emotion Recognition Network Based on Feature Clustering
    Zhang, Li-Min
    Ng, Giap Weng
    Leau, Yu-Beng
    Yan, Hao
    IEEE ACCESS, 2023, 11 : 71224 - 71234
  • [37] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
    Song, Peng
    Zheng, Wenming
    Yu, Yanwei
    Ou, Shifeng
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
  • [38] An adversarial discriminative temporal convolutional network for EEG-based cross-domain emotion recognition
    He, Zhipeng
    Zhong, Yongshi
    Pan, Jiahui
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 141
  • [39] Coordination Attention based Transformers with bidirectional contrastive loss for multimodal speech emotion recognition
    Fan, Weiquan
    Xu, Xiangmin
    Zhou, Guohua
    Deng, Xiaofang
    Xing, Xiaofen
    SPEECH COMMUNICATION, 2025, 169
  • [40] DeepCNN: Spectro-temporal feature representation for speech emotion recognition
    Saleem, Nasir
    Gao, Jiechao
    Irfan, Rizwana
    Almadhor, Ahmad
    Rauf, Hafiz Tayyab
    Zhang, Yudong
    Kadry, Seifedine
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (02) : 401 - 417