SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引：24

作者：

Gat, Itai ^{[1
]}

Aronowitz, Hagai ^{[1
]}

Zhu, Weizhong ^{[1
]}

Morais, Edmilson ^{[1
]}

Hoory, Ron ^{[1
]}

机构：

[1] IBM Res AI, Albany, NY 12203 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Speech emotion recognition; speaker normalization; self-supervised learning;

D O I：

10.1109/ICASSP43922.2022.9747460

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.

引用

页码：7342 / 7346

页数：5

共 50 条

[31] ON THE USE OF SELF-SUPERVISED PRE-TRAINED ACOUSTIC AND LINGUISTIC FEATURES FOR CONTINUOUS SPEECH EMOTION RECOGNITION
Macary, Manon
Tahon, Marie
Esteve, Yannick
Rousseau, Anthony
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 373 - 380
[32] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
Sang, Mufan
Li, Haoqi
Liu, Fang
Arnold, Andrew O.
Wan, Li
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
[33] LAYER-WISE ANALYSIS OF SELF-SUPERVISED ACOUSTIC WORD EMBEDDINGS: A STUDY ON SPEECH EMOTION RECOGNITION
Saliba, Alexandra
Li, Yuanchao
Sanabria, Ramon
Lai, Catherine
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 590 - 594
[34] SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition
Zhang, Ruiteng
Wei, Jianguo
Lu, Xugang
Li, Yongwei
Xu, Junhai
Jin, Di
Tao, Jianhua
INTERSPEECH 2023, 2023, : 1858 - 1862
[35] Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations
Khaertdinov, Bulat
Jeuris, Pedro
Sousa, Annanda
Hortal, Enrique
INTERSPEECH 2024, 2024, : 4708 - 4712
[36] Breaking Barriers with Enhanced DINO Framework and Score Normalization to Self-supervised Speaker Verification
Wan, Xianmei
Zhan, Xiaosi
Li, Na
Liao, Guihua
PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, ICDSP 2024, 2024, : 158 - 164
[37] On Separate Normalization in Self-supervised Transformers
Chen, Xiaohui
Wang, Yinkai
Du, Yuanqi
Hassoun, Soha
Liu, Li-Ping
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[38] Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition
Violeta, Lester Phillip
Huang, Wen-Chin
Toda, Tomoki
INTERSPEECH 2022, 2022, : 41 - 45
[39] Robust Self-Supervised Audio-Visual Speech Recognition
Shi, Bowen
Hsu, Wei-Ning
Mohamed, Abdelrahman
INTERSPEECH 2022, 2022, : 2118 - 2122
[40] Domain Adaptive Self-supervised Training of Automatic Speech Recognition
Do, Cong-Thanh
Doddipatla, Rama
Li, Mohan
Hain, Thomas
INTERSPEECH 2023, 2023, : 4389 - 4393

← 1 2 3 4 5 →