Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

被引:1
|
作者
Wang, Shijun [1 ]
Hemati, Hamed [1 ]
Gudnason, Jon [2 ]
Borth, Damian [1 ]
机构
[1] Univ St Gallen, St Gallen, Switzerland
[2] Reykjavik Univ, Reykjavik, Iceland
来源
关键词
speech emotion recognition; speech augmentation; cross lingual; ADVERSARIAL NETWORKS; STARGAN;
D O I
10.21437/Interspeech.2022-10667
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) is crucial for humancomputer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many existing spoken languages. To address these problems, we exploit a GAN-based augmentation model guided by a triplet network, to improve SER performance given imbalanced and insufficient training data. We conduct experiments and demonstrate: 1) With a highly imbalanced dataset, our augmentation strategy significantly improves the SER performance (+8% recall score compared with the baseline). 2) Moreover, in a cross-lingual benchmark, where we train a model with enough source language utterances but very few target language utterances (around 50 in our experiments), our augmentation strategy brings benefits for the SER performance of all three target languages.
引用
收藏
页码:391 / 395
页数:5
相关论文
共 50 条
  • [21] Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition
    Sheng, Peiyao
    Yang, Zhuolin
    Hu, Hu
    Tan, Tian
    Qian, Yanmin
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 121 - 125
  • [22] GENERATIVE ADVERSARIAL NETWORKS BASED DATA AUGMENTATION FOR NOISE ROBUST SPEECH RECOGNITION
    Hu, Hu
    Tan, Tian
    Qian, Yanmin
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5044 - 5048
  • [23] COPYPASTE: AN AUGMENTATION METHOD FOR SPEECH EMOTION RECOGNITION
    Pappagari, Raghavendra
    Villalba, Jesus
    Zelasko, Piotr
    Moro-Velazquez, Laureano
    Dehak, Najim
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6324 - 6328
  • [24] Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function
    Huang, Jian
    Li, Ya
    Tao, Jianhua
    Lian, Zheng
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3673 - 3677
  • [25] Data augmentation for enhancing EEG-based emotion recognition with deep generative models
    Luo, Yun
    Zhu, Li-Zhen
    Wan, Zi-Yu
    Lu, Bao-Liang
    JOURNAL OF NEURAL ENGINEERING, 2020, 17 (05)
  • [26] Data Augmentation for EEG-Based Emotion Recognition Using Generative Adversarial Networks
    Bao, Guangcheng
    Yan, Bin
    Tong, Li
    Shu, Jun
    Wang, Linyuan
    Yang, Kai
    Zeng, Ying
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2021, 15
  • [27] Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation
    Baek, Ji-Young
    Lee, Seok-Pil
    Tsihrintzis, George A.
    ELECTRONICS, 2023, 12 (18)
  • [28] Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition
    Ranjan, Sumit
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    INTERSPEECH 2024, 2024, : 1040 - 1044
  • [29] Augmenting Generative Adversarial Networks for Speech Emotion Recognition
    Latif, Siddique
    Asim, Muhammad
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Schuller, Bjoern W.
    INTERSPEECH 2020, 2020, : 521 - 525
  • [30] FI-Net: A Speech Emotion Recognition Framework with Feature Integration and Data Augmentation
    Xia, Guangmin
    Li, Fan
    Zhao, Dongdi
    Zhang, Qian
    Yang, Song
    5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 195 - 203