Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

被引:1
|
作者
Wang, Shijun [1 ]
Hemati, Hamed [1 ]
Gudnason, Jon [2 ]
Borth, Damian [1 ]
机构
[1] Univ St Gallen, St Gallen, Switzerland
[2] Reykjavik Univ, Reykjavik, Iceland
来源
关键词
speech emotion recognition; speech augmentation; cross lingual; ADVERSARIAL NETWORKS; STARGAN;
D O I
10.21437/Interspeech.2022-10667
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) is crucial for humancomputer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many existing spoken languages. To address these problems, we exploit a GAN-based augmentation model guided by a triplet network, to improve SER performance given imbalanced and insufficient training data. We conduct experiments and demonstrate: 1) With a highly imbalanced dataset, our augmentation strategy significantly improves the SER performance (+8% recall score compared with the baseline). 2) Moreover, in a cross-lingual benchmark, where we train a model with enough source language utterances but very few target language utterances (around 50 in our experiments), our augmentation strategy brings benefits for the SER performance of all three target languages.
引用
收藏
页码:391 / 395
页数:5
相关论文
共 50 条
  • [31] Real-time speech emotion recognition using deep learning and data augmentation
    Barhoumi, Chawki
    Benayed, Yassine
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 58 (02)
  • [32] Enhanced Speech Emotion Recognition Using Conditional-DCGAN-Based Data Augmentation
    Roh, Kyung-Min
    Lee, Seok-Pil
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [33] Generative Data Augmentation applied to Face Recognition
    Jabberi, Marwa
    Wali, Ali
    Alimi, Adel M.
    2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 242 - 247
  • [34] On Enhancing Speech Emotion Recognition using Generative Adversarial Networks
    Sahu, Saurabh
    Gupta, Rahul
    Espy-Wilson, Carol
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3693 - 3697
  • [35] Speech Emotion Recognition Using Hybrid Generative and Discriminative Models
    Huang, Yongming
    Zhang, Guobao
    Dong, Fei
    Li, Yue
    Da, Feipeng
    PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (3B): : 105 - 108
  • [36] Dataset-Distillation Generative Model for Speech Emotion Recognition
    Ritter-Gutierrez, Fabian
    Huang, Kuan-Po
    Wong, Jeremy H. M.
    Ng, Dianwen
    Lee, Hung-yi
    Chen, Nancy F.
    Chng, Eng-Siong
    INTERSPEECH 2024, 2024, : 2640 - 2644
  • [37] Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks
    Bouchelligua, Wided
    Al-Dayil, Reham
    Algaith, Areej
    APPLIED SCIENCES-BASEL, 2025, 15 (04):
  • [38] Performance Improvement of Speech Emotion Recognition Using ResNet Model with Data Augmentation-Saturation
    Lee, Minjeong
    Lee, Miran
    APPLIED SCIENCES-BASEL, 2025, 15 (04):
  • [39] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
    Al-onazi, Badriyya B.
    Nauman, Muhammad Asif
    Jahangir, Rashid
    Malik, Muhmmad Mohsin
    Alkhammash, Eman H.
    Elshewey, Ahmed M.
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [40] STARGAN FOR EMOTIONAL SPEECH CONVERSION: VALIDATED BY DATA AUGMENTATION OF END-TO-END EMOTION RECOGNITION
    Rizos, Georgios
    Baird, Alice
    Elliott, Max
    Schuller, Bjorn
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3502 - 3506