SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引:24
|
作者
Gat, Itai [1 ]
Aronowitz, Hagai [1 ]
Zhu, Weizhong [1 ]
Morais, Edmilson [1 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Albany, NY 12203 USA
关键词
Speech emotion recognition; speaker normalization; self-supervised learning;
D O I
10.1109/ICASSP43922.2022.9747460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.
引用
收藏
页码:7342 / 7346
页数:5
相关论文
共 50 条
  • [31] ON THE USE OF SELF-SUPERVISED PRE-TRAINED ACOUSTIC AND LINGUISTIC FEATURES FOR CONTINUOUS SPEECH EMOTION RECOGNITION
    Macary, Manon
    Tahon, Marie
    Esteve, Yannick
    Rousseau, Anthony
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 373 - 380
  • [32] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
    Sang, Mufan
    Li, Haoqi
    Liu, Fang
    Arnold, Andrew O.
    Wan, Li
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
  • [33] LAYER-WISE ANALYSIS OF SELF-SUPERVISED ACOUSTIC WORD EMBEDDINGS: A STUDY ON SPEECH EMOTION RECOGNITION
    Saliba, Alexandra
    Li, Yuanchao
    Sanabria, Ramon
    Lai, Catherine
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 590 - 594
  • [34] SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition
    Zhang, Ruiteng
    Wei, Jianguo
    Lu, Xugang
    Li, Yongwei
    Xu, Junhai
    Jin, Di
    Tao, Jianhua
    INTERSPEECH 2023, 2023, : 1858 - 1862
  • [35] Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations
    Khaertdinov, Bulat
    Jeuris, Pedro
    Sousa, Annanda
    Hortal, Enrique
    INTERSPEECH 2024, 2024, : 4708 - 4712
  • [36] Breaking Barriers with Enhanced DINO Framework and Score Normalization to Self-supervised Speaker Verification
    Wan, Xianmei
    Zhan, Xiaosi
    Li, Na
    Liao, Guihua
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, ICDSP 2024, 2024, : 158 - 164
  • [37] On Separate Normalization in Self-supervised Transformers
    Chen, Xiaohui
    Wang, Yinkai
    Du, Yuanqi
    Hassoun, Soha
    Liu, Li-Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [38] Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition
    Violeta, Lester Phillip
    Huang, Wen-Chin
    Toda, Tomoki
    INTERSPEECH 2022, 2022, : 41 - 45
  • [39] Robust Self-Supervised Audio-Visual Speech Recognition
    Shi, Bowen
    Hsu, Wei-Ning
    Mohamed, Abdelrahman
    INTERSPEECH 2022, 2022, : 2118 - 2122
  • [40] Domain Adaptive Self-supervised Training of Automatic Speech Recognition
    Do, Cong-Thanh
    Doddipatla, Rama
    Li, Mohan
    Hain, Thomas
    INTERSPEECH 2023, 2023, : 4389 - 4393