SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引：24

作者：

Gat, Itai ^{[1
]}

Aronowitz, Hagai ^{[1
]}

Zhu, Weizhong ^{[1
]}

Morais, Edmilson ^{[1
]}

Hoory, Ron ^{[1
]}

机构：

[1] IBM Res AI, Albany, NY 12203 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Speech emotion recognition; speaker normalization; self-supervised learning;

D O I：

10.1109/ICASSP43922.2022.9747460

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.

引用

页码：7342 / 7346

页数：5

共 50 条

[21] EXTRACTING SPEAKER AND EMOTION INFORMATION FROM SELF-SUPERVISED SPEECH MODELS VIA CHANNEL-WISE CORRELATIONS
Stafylakis, Themos
Mosner, Ladislav
Kakouros, Sofoklis
Plchot, Oldrich
Burget, Lukas
Cernocky, Jan
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1136 - 1143
[22] Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
Mohamed, Mukhtar
Liu, Danyi
Tang, Hao
Goldwater, Sharon
INTERSPEECH 2024, 2024, : 3625 - 3629
[23] Self-supervised utterance order prediction for emotion recognition in conversations
Jiang, Dazhi
Liu, Hao
Tu, Geng
Wei, Runguo
Cambria, Erik
NEUROCOMPUTING, 2024, 577
[24] Self-Supervised EEG Emotion Recognition Models Based on CNN
Wang, Xingyi
Ma, Yuliang
Cammon, Jared
Fang, Feng
Gao, Yunyuan
Zhang, Yingchun
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 1952 - 1962
[25] Self-Supervised EEG Representation Learning for Robust Emotion Recognition
Liu, Huan
Zhang, Yuzhe
Chen, Xuxu
Zhang, Dalin
Li, Rui
Qin, Tao
ACM TRANSACTIONS ON SENSOR NETWORKS, 2024, 20 (05)
[26] SELF-SUPERVISED LEARNING FOR ECG-BASED EMOTION RECOGNITION
Sarkar, Pritam
Etemad, Ali
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3217 - 3221
[27] Transformer-Based Self-Supervised Learning for Emotion Recognition
Vazquez-Rodriguez, Juan
Lefebvre, Gregoire
Cumin, Julien
Crowley, James L.
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
[28] Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition
Cai, Danwei
Wang, Weiqing
Li, Ming
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1422 - 1435
[29] Improving speech emotion recognition by fusing self-supervised learning and spectral features via mixture of experts
Hyeon, Jonghwan
Oh, Yung-Hwan
Lee, Young-Jun
Choi, Ho-Jin
DATA & KNOWLEDGE ENGINEERING, 2024, 150
[30] Improving Speech Emotion Recognition Using Self-Supervised Learning with Domain-Specific Audiovisual Tasks
Goncalves, Lucas
Busso, Carlos
INTERSPEECH 2022, 2022, : 1168 - 1172

← 1 2 3 4 5 →