Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

被引：1

作者：

Wang, Shijun ^{[1
]}

Hemati, Hamed ^{[1
]}

Gudnason, Jon ^{[2
]}

Borth, Damian ^{[1
]}

机构：

[1] Univ St Gallen, St Gallen, Switzerland

[2] Reykjavik Univ, Reykjavik, Iceland

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech emotion recognition; speech augmentation; cross lingual; ADVERSARIAL NETWORKS; STARGAN;

D O I：

10.21437/Interspeech.2022-10667

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech Emotion Recognition (SER) is crucial for humancomputer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many existing spoken languages. To address these problems, we exploit a GAN-based augmentation model guided by a triplet network, to improve SER performance given imbalanced and insufficient training data. We conduct experiments and demonstrate: 1) With a highly imbalanced dataset, our augmentation strategy significantly improves the SER performance (+8% recall score compared with the baseline). 2) Moreover, in a cross-lingual benchmark, where we train a model with enough source language utterances but very few target language utterances (around 50 in our experiments), our augmentation strategy brings benefits for the SER performance of all three target languages.

引用

页码：391 / 395

页数：5

共 50 条

[31] Real-time speech emotion recognition using deep learning and data augmentation
Barhoumi, Chawki
Benayed, Yassine
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 58 (02)
[32] Enhanced Speech Emotion Recognition Using Conditional-DCGAN-Based Data Augmentation
Roh, Kyung-Min
Lee, Seok-Pil
APPLIED SCIENCES-BASEL, 2024, 14 (21):
[33] Generative Data Augmentation applied to Face Recognition
Jabberi, Marwa
Wali, Ali
Alimi, Adel M.
2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 242 - 247
[34] On Enhancing Speech Emotion Recognition using Generative Adversarial Networks
Sahu, Saurabh
Gupta, Rahul
Espy-Wilson, Carol
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3693 - 3697
[35] Speech Emotion Recognition Using Hybrid Generative and Discriminative Models
Huang, Yongming
Zhang, Guobao
Dong, Fei
Li, Yue
Da, Feipeng
PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (3B): : 105 - 108
[36] Dataset-Distillation Generative Model for Speech Emotion Recognition
Ritter-Gutierrez, Fabian
Huang, Kuan-Po
Wong, Jeremy H. M.
Ng, Dianwen
Lee, Hung-yi
Chen, Nancy F.
Chng, Eng-Siong
INTERSPEECH 2024, 2024, : 2640 - 2644
[37] Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks
Bouchelligua, Wided
Al-Dayil, Reham
Algaith, Areej
APPLIED SCIENCES-BASEL, 2025, 15 (04):
[38] Performance Improvement of Speech Emotion Recognition Using ResNet Model with Data Augmentation-Saturation
Lee, Minjeong
Lee, Miran
APPLIED SCIENCES-BASEL, 2025, 15 (04):
[39] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
Al-onazi, Badriyya B.
Nauman, Muhammad Asif
Jahangir, Rashid
Malik, Muhmmad Mohsin
Alkhammash, Eman H.
Elshewey, Ahmed M.
APPLIED SCIENCES-BASEL, 2022, 12 (18):
[40] STARGAN FOR EMOTIONAL SPEECH CONVERSION: VALIDATED BY DATA AUGMENTATION OF END-TO-END EMOTION RECOGNITION
Rizos, Georgios
Baird, Alice
Elliott, Max
Schuller, Bjorn
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3502 - 3506

← 1 2 3 4 5 →