SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引:24
|
作者
Gat, Itai [1 ]
Aronowitz, Hagai [1 ]
Zhu, Weizhong [1 ]
Morais, Edmilson [1 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Albany, NY 12203 USA
关键词
Speech emotion recognition; speaker normalization; self-supervised learning;
D O I
10.1109/ICASSP43922.2022.9747460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.
引用
收藏
页码:7342 / 7346
页数:5
相关论文
共 50 条
  • [21] EXTRACTING SPEAKER AND EMOTION INFORMATION FROM SELF-SUPERVISED SPEECH MODELS VIA CHANNEL-WISE CORRELATIONS
    Stafylakis, Themos
    Mosner, Ladislav
    Kakouros, Sofoklis
    Plchot, Oldrich
    Burget, Lukas
    Cernocky, Jan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1136 - 1143
  • [22] Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
    Mohamed, Mukhtar
    Liu, Danyi
    Tang, Hao
    Goldwater, Sharon
    INTERSPEECH 2024, 2024, : 3625 - 3629
  • [23] Self-supervised utterance order prediction for emotion recognition in conversations
    Jiang, Dazhi
    Liu, Hao
    Tu, Geng
    Wei, Runguo
    Cambria, Erik
    NEUROCOMPUTING, 2024, 577
  • [24] Self-Supervised EEG Emotion Recognition Models Based on CNN
    Wang, Xingyi
    Ma, Yuliang
    Cammon, Jared
    Fang, Feng
    Gao, Yunyuan
    Zhang, Yingchun
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 1952 - 1962
  • [25] Self-Supervised EEG Representation Learning for Robust Emotion Recognition
    Liu, Huan
    Zhang, Yuzhe
    Chen, Xuxu
    Zhang, Dalin
    Li, Rui
    Qin, Tao
    ACM TRANSACTIONS ON SENSOR NETWORKS, 2024, 20 (05)
  • [26] SELF-SUPERVISED LEARNING FOR ECG-BASED EMOTION RECOGNITION
    Sarkar, Pritam
    Etemad, Ali
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3217 - 3221
  • [27] Transformer-Based Self-Supervised Learning for Emotion Recognition
    Vazquez-Rodriguez, Juan
    Lefebvre, Gregoire
    Cumin, Julien
    Crowley, James L.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
  • [28] Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition
    Cai, Danwei
    Wang, Weiqing
    Li, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1422 - 1435
  • [29] Improving speech emotion recognition by fusing self-supervised learning and spectral features via mixture of experts
    Hyeon, Jonghwan
    Oh, Yung-Hwan
    Lee, Young-Jun
    Choi, Ho-Jin
    DATA & KNOWLEDGE ENGINEERING, 2024, 150
  • [30] Improving Speech Emotion Recognition Using Self-Supervised Learning with Domain-Specific Audiovisual Tasks
    Goncalves, Lucas
    Busso, Carlos
    INTERSPEECH 2022, 2022, : 1168 - 1172