ISNet: Individual Standardization Network for Speech Emotion Recognition

被引:23
|
作者
Fan, Weiquan [1 ]
Xu, Xiangmin [1 ]
Cai, Bolun [1 ]
Xing, Xiaofen [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Emotion recognition; Feature extraction; Benchmark testing; Standardization; Speech processing; Task analysis; Individual standardization network (ISNet); speech emotion recognition; individual differences; metric; dataset; CLASSIFICATION; ATTENTION; FEATURES; VOICE;
D O I
10.1109/TASLP.2022.3171965
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech emotion recognition plays an essential role in human-computer interaction. However, cross-individual representation learning and individual-agnostic systems are challenging due to the distribution deviation caused by individual differences. The existing related approaches mostly use the auxiliary task of speaker recognition to eliminate individual differences. Unfortunately, although these methods can reduce interindividual voiceprint differences, it is difficult to dissociate interindividual expression differences since each individual has its unique expression habits. In this paper, we propose an individual standardization network (ISNet) for speech emotion recognition to alleviate the problem of interindividual emotion confusion caused by individual differences. Specifically, we model individual benchmarks as representations of nonemotional neutral speech, and ISNet realizes individual standardization using the automatically generated benchmark, which improves the robustness of individual-agnostic emotion representations. In response to individual differences, we also propose more comprehensive and meaningful individual-level evaluation metrics. In addition, we continue our previous work to construct a challenging large-scale speech emotion dataset (LSSED). We propose a more reasonable division method of the training set and testing set to prevent individual information leakage. Experimental results on datasets of both large and small scales have proven the effectiveness of ISNet, and the new state-of-the-art performance is achieved under the same experimental conditions on IEMOCAP and LSSED.
引用
收藏
页码:1803 / 1814
页数:12
相关论文
共 50 条
  • [1] PulseEmoNet: Pulse emotion network for speech emotion recognition
    Zhang, Huiyun
    Tang, Gaigai
    Huang, Heming
    Yuan, Zhu
    Li, Zongjin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 105
  • [2] Speech Emotion Recognition with Hybrid Neural Network
    Wei, Chuanzheng
    Sun, Xiao
    Tian, Fang
    Ren, Fuji
    5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 298 - 302
  • [3] Deep scattering network for speech emotion recognition
    Singh, Premjeet
    Saha, Goutam
    Sahidullah, Md
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 131 - 135
  • [4] Graph Isomorphism Network for Speech Emotion Recognition
    Liu, Jiawang
    Wang, Haoxiang
    INTERSPEECH 2021, 2021, : 3405 - 3409
  • [5] Speech emotion recognition using the novel PEmoNet (Parallel Emotion Network)
    Bhangale, Kishor B.
    Kothandaraman, Mohanaprasad
    APPLIED ACOUSTICS, 2023, 212
  • [6] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
  • [7] Application of probabilistic neural network for speech emotion recognition
    Deshmukh S.
    Gupta P.
    International Journal of Speech Technology, 2024, 27 (01) : 19 - 28
  • [8] Design of a Convolutional Neural Network for Speech Emotion Recognition
    Lee, Kyong Hee
    Kim, Do Hyun
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1332 - 1335
  • [9] A neural network approach for human emotion recognition in speech
    Bhatti, MW
    Wang, YJ
    Guan, L
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 2, PROCEEDINGS, 2004, : 181 - 184
  • [10] Speech Emotion Recognition Based on Deep Belief Network
    Shi, Peng
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,