ISNet: Individual Standardization Network for Speech Emotion Recognition

被引:23
|
作者
Fan, Weiquan [1 ]
Xu, Xiangmin [1 ]
Cai, Bolun [1 ]
Xing, Xiaofen [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Emotion recognition; Feature extraction; Benchmark testing; Standardization; Speech processing; Task analysis; Individual standardization network (ISNet); speech emotion recognition; individual differences; metric; dataset; CLASSIFICATION; ATTENTION; FEATURES; VOICE;
D O I
10.1109/TASLP.2022.3171965
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech emotion recognition plays an essential role in human-computer interaction. However, cross-individual representation learning and individual-agnostic systems are challenging due to the distribution deviation caused by individual differences. The existing related approaches mostly use the auxiliary task of speaker recognition to eliminate individual differences. Unfortunately, although these methods can reduce interindividual voiceprint differences, it is difficult to dissociate interindividual expression differences since each individual has its unique expression habits. In this paper, we propose an individual standardization network (ISNet) for speech emotion recognition to alleviate the problem of interindividual emotion confusion caused by individual differences. Specifically, we model individual benchmarks as representations of nonemotional neutral speech, and ISNet realizes individual standardization using the automatically generated benchmark, which improves the robustness of individual-agnostic emotion representations. In response to individual differences, we also propose more comprehensive and meaningful individual-level evaluation metrics. In addition, we continue our previous work to construct a challenging large-scale speech emotion dataset (LSSED). We propose a more reasonable division method of the training set and testing set to prevent individual information leakage. Experimental results on datasets of both large and small scales have proven the effectiveness of ISNet, and the new state-of-the-art performance is achieved under the same experimental conditions on IEMOCAP and LSSED.
引用
收藏
页码:1803 / 1814
页数:12
相关论文
共 50 条
  • [31] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
  • [32] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
    Yi, Lu
    Mak, Man-Wai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184
  • [33] Performance Evaluation of Deep Autoencoder Network for Speech Emotion Recognition
    AndleebSiddiqui, Maria
    Hussain, Wajahat
    Ali, Syed Abbas
    Danish-ur-Rehman
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 606 - 611
  • [34] Speech Emotion Recognition using MFCC features and LSTM network
    Kumbhar, Harshawardhan S.
    Bhandari, Sheetal U.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [35] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
    Huang, Ying
    Hu, Mingqing
    Yu, Xianguo
    Wang, Tao
    Yang, Chen
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
  • [36] Bidirectional parallel echo state network for speech emotion recognition
    Hemin Ibrahim
    Chu Kiong Loo
    Fady Alnajjar
    Neural Computing and Applications, 2022, 34 : 17581 - 17599
  • [37] Speech Emotion Recognition based on Interactive Convolutional Neural Network
    Cheng, Huihui
    Tang, Xiaoyu
    2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167
  • [38] Multi-modal Correlated Network for emotion recognition in speech
    Ren, Minjie
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
  • [39] Speech emotion recognition based on spiking neural network and convolutional neural network
    Du, Chengyan
    Liu, Fu
    Kang, Bing
    Hou, Tao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 147
  • [40] Speech emotion recognition using the novel SwinEmoNet (Shifted Window Transformer Emotion Network)
    Ramesh R.
    Prahaladhan V.B.
    Nithish P.
    Mohanaprasad K.
    International Journal of Speech Technology, 2024, 27 (03) : 551 - 568