SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引：24

作者：

Gat, Itai ^{[1
]}

Aronowitz, Hagai ^{[1
]}

Zhu, Weizhong ^{[1
]}

Morais, Edmilson ^{[1
]}

Hoory, Ron ^{[1
]}

机构：

[1] IBM Res AI, Albany, NY 12203 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Speech emotion recognition; speaker normalization; self-supervised learning;

D O I：

10.1109/ICASSP43922.2022.9747460

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.

引用

页码：7342 / 7346

页数：5

共 50 条

[41] Research on Mongolian Speech Recognition Based on the Self-supervised Model
Su, Hongyi
Xue, Yu
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 199 - 203
[42] Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition
Violeta, Lester Phillip
Huang, Wen-Chin
Toda, Tomoki
arXiv, 2022,
[43] Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement
Yang, Hejung
Kang, Hong-Goo
INTERSPEECH 2023, 2023, : 814 - 818
[44] EFFICIENT ADAPTER TRANSFER OF SELF-SUPERVISED SPEECH MODELS FOR AUTOMATIC SPEECH RECOGNITION
Thomas, Bethan
Kessler, Samuel
Karout, Salah
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7102 - 7106
[45] Speaker Attentive Speech Emotion Recognition
Le Moine, Clement
Obin, Nicolas
Roebel, Axel
INTERSPEECH 2021, 2021, : 2866 - 2870
[46] Speaker Awareness for Speech Emotion Recognition
Assuncao, Gustavo
Menezes, Paulo
Perdigao, Fernando
INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (04) : 15 - 22
[47] An Emotion Recognition Method Based On Feature Fusion and Self-Supervised Learning
Cao, Xuanmeng
Sun, Ming
2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 216 - 221
[48] Self-supervised representation learning using multimodal Transformer for emotion recognition
Goetz, Theresa
Arora, Pulkit
Erick, F. X.
Holzer, Nina
Sawant, Shrutika
PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON SENSOR-BASED ACTIVITY RECOGNITION AND ARTIFICIAL INTELLIGENCE, IWOAR 2023, 2023,
[49] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
Khare, Aparna
Parthasarathy, Srinivas
Sundaram, Shiva
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388
[50] Speaker normalization for template based speech recognition
Demange, Sebastien
Van Compernolle, Dirk
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 560 - 563

← 1 2 3 4 5 →