ISNet: Individual Standardization Network for Speech Emotion Recognition

被引:23
|
作者
Fan, Weiquan [1 ]
Xu, Xiangmin [1 ]
Cai, Bolun [1 ]
Xing, Xiaofen [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Emotion recognition; Feature extraction; Benchmark testing; Standardization; Speech processing; Task analysis; Individual standardization network (ISNet); speech emotion recognition; individual differences; metric; dataset; CLASSIFICATION; ATTENTION; FEATURES; VOICE;
D O I
10.1109/TASLP.2022.3171965
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech emotion recognition plays an essential role in human-computer interaction. However, cross-individual representation learning and individual-agnostic systems are challenging due to the distribution deviation caused by individual differences. The existing related approaches mostly use the auxiliary task of speaker recognition to eliminate individual differences. Unfortunately, although these methods can reduce interindividual voiceprint differences, it is difficult to dissociate interindividual expression differences since each individual has its unique expression habits. In this paper, we propose an individual standardization network (ISNet) for speech emotion recognition to alleviate the problem of interindividual emotion confusion caused by individual differences. Specifically, we model individual benchmarks as representations of nonemotional neutral speech, and ISNet realizes individual standardization using the automatically generated benchmark, which improves the robustness of individual-agnostic emotion representations. In response to individual differences, we also propose more comprehensive and meaningful individual-level evaluation metrics. In addition, we continue our previous work to construct a challenging large-scale speech emotion dataset (LSSED). We propose a more reasonable division method of the training set and testing set to prevent individual information leakage. Experimental results on datasets of both large and small scales have proven the effectiveness of ISNet, and the new state-of-the-art performance is achieved under the same experimental conditions on IEMOCAP and LSSED.
引用
收藏
页码:1803 / 1814
页数:12
相关论文
共 50 条
  • [41] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [42] Speech emotion recognition based on emotion perception
    Liu, Gang
    Cai, Shifang
    Wang, Ce
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [43] Autoencoder With Emotion Embedding for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE ACCESS, 2021, 9 : 51231 - 51241
  • [44] Autoencoder with emotion embedding for speech emotion recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE Access, 2021, 9 : 51231 - 51241
  • [45] English speech emotion recognition method based on speech recognition
    Liu, Man
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
  • [46] English speech emotion recognition method based on speech recognition
    Man Liu
    International Journal of Speech Technology, 2022, 25 : 391 - 398
  • [47] Emotion Recognition in Arabic Speech
    Klaylat, Samira
    Hamandi, Lama
    Osman, Ziad
    Zantout, Rached
    2017 SENSORS NETWORKS SMART AND EMERGING TECHNOLOGIES (SENSET), 2017,
  • [48] Multiroom Speech Emotion Recognition
    Shalev, Erez
    Cohen, Israel
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 135 - 139
  • [49] Emotion recognition in Arabic speech
    Samira Klaylat
    Ziad Osman
    Lama Hamandi
    Rached Zantout
    Analog Integrated Circuits and Signal Processing, 2018, 96 : 337 - 351
  • [50] Persian Speech Emotion Recognition
    Savargiv, Mohammad
    Bastanfard, Azam
    2015 7TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2015,