Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition

被引:5
|
作者
Liu, Yang [1 ]
Sun, Haoqin [1 ]
Guan, Wenbo [1 ]
Xia, Yuqi [1 ]
Zhao, Zhen [1 ]
机构
[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266061, Peoples R China
来源
关键词
Speech Emotion Recognition; Three-channel Features; Cascaded Attention Network; Adversarial Joint Loss;
D O I
10.21437/Interspeech.2022-11480
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accurately recognizing emotion from speech is a necessary yet challenging task due to its complexity. A common problem existing in most of the previous studies is that some of the particular emotions are severely misclassified. In this paper, we propose a novel framework integrating cascaded attention and adversarial joint loss for speech emotion recognition, aiming at discriminating the confusions by emphasizing more on the emotions which are difficult to be correctly classified. Specifically, we propose a cascaded attention network to extract effective emotional features, where spatiotemporal attention selectively locates the targeted emotional regions from the input features. In these targeted regions, the self-attention with head fusion captures the long-distance dependence of temporal features. Furthermore, an adversarial joint loss strategy is proposed to distinguish the emotional embeddings with high similarity by the generated hard triplets in an adversarial fashion. Experimental results on the benchmark dataset IEMOCAP demonstrate that our method gains an absolute improvement of 3.17% and 0.39% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.
引用
收藏
页码:4750 / 4754
页数:5
相关论文
共 50 条
  • [1] A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Li, Yongwei
    Unoki, Masashi
    Zhao, Zhen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1063 - 1074
  • [2] Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition
    Liu, Yang
    Chen, Xin
    Song, Yuan
    Li, Yarong
    Wang, Shengbei
    Yuan, Weitao
    Li, Yongwei
    Zhao, Zhen
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [3] Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (04) : 595 - 604
  • [4] Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions
    Yang Liu
    Haoqin Sun
    Wenbo Guan
    Yuqi Xia
    Zhen Zhao
    Machine Intelligence Research, 2023, 20 : 595 - 604
  • [5] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
    Hu, Ying
    Hou, Shijing
    Yang, Huamin
    Huang, Hao
    He, Liang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
  • [6] A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition
    Liu, Yang
    Xia, Yuqi
    Sun, Haoqin
    Meng, Xiaolei
    Bai, Jianxiong
    Guan, Wenbo
    Zhao, Zhen
    LI, Yongwei
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2023, E106A (06) : 876 - 885
  • [7] Speech Emotion Recognition with Discriminative Feature Learning
    Zhou, Huan
    Liu, Kai
    INTERSPEECH 2020, 2020, : 4094 - 4097
  • [8] Discriminative Feature Learning for Speech Emotion Recognition
    Zhang, Yuying
    Zou, Yuexian
    Peng, Junyi
    Luo, Danqing
    Huang, Dongyan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 198 - 210
  • [9] DOMAIN-ADVERSARIAL AUTOENCODER WITH ATTENTION BASED FEATURE LEVEL FUSION FOR SPEECH EMOTION RECOGNITION
    Gao, Yuan
    Liu, JiaXing
    Wang, Longbiao
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6314 - 6318
  • [10] Feature representation for speech emotion Recognition
    Abdollahpour, Mehdi
    Zamani, Lafar
    Rad, Hamidreza Saligheh
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1465 - 1468