Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition

被引:0
|
作者
Ranjan, Sumit [1 ]
Chakraborty, Rupayan [1 ]
Kopparapu, Sunil Kumar [1 ]
机构
[1] Tata Consultancy Serv Ltd, TCS Res, Bengaluru, India
来源
关键词
speech emotion recognition; noise robustness; selective data augmentation; reinforcement learning;
D O I
10.21437/Interspeech.2024-921
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is an indispensable component of any human machine interactions, and enables building empathetic voice user interfaces. Ability to accurately recognize emotion in noisy environments is important in practical scenarios when a person is interacting with a machine or an agent as in the case of a voice based call center. In this paper, we propose reinforcement learning (RL) based data augmentation technique to enable building a robust SER system. The reward function used in RL enables picking selective noises spread over different frequency bands for data augmentation. We show that the proposed RL based augmentation technique is superior to a recently proposed random selection based technique for the noise robust SER task. We use IEMOCAP dataset with four emotion classes for validating the proposed technique. Moreover, we test the noise robustness of SER system in both cross-corpus and cross-language scenarios.
引用
收藏
页码:1040 / 1044
页数:5
相关论文
共 50 条
  • [31] COPYPASTE: AN AUGMENTATION METHOD FOR SPEECH EMOTION RECOGNITION
    Pappagari, Raghavendra
    Villalba, Jesus
    Zelasko, Piotr
    Moro-Velazquez, Laureano
    Dehak, Najim
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6324 - 6328
  • [32] Robust recognition of emotion from speech
    Hoque, Mohammed E.
    Yeasin, Mohammed
    Louwerse, Max M.
    INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2006, 4133 : 42 - 53
  • [33] Dynamic data augmentation parameter learning-based unbalanced facial emotion recognition
    Shao, Jie
    Shi, Yanyang
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (05)
  • [34] Towards Noise Robust Speech Emotion Recognition Using Dynamic Layer Customization
    Wilf, Alex
    Provost, Emily Mower
    2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2021,
  • [35] Front-End Feature Compensation for Noise Robust Speech Emotion Recognition
    Pandharipande, Meghna
    Chakraborty, Rupayan
    Panda, Ashish
    Das, Biswajit
    Kopparapu, Sunil Kumar
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [36] Multi-condition training for noise-robust speech emotion recognition
    Chiba, Yuya
    Nose, Takashi
    Ito, Akinori
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2019, 40 (06) : 406 - 409
  • [37] A COMPARISON OF STREAMING MODELS AND DATA AUGMENTATION METHODS FOR ROBUST SPEECH RECOGNITION
    Kim, Jiyeon
    Kumar, Mehul
    Gowda, Dhananjaya
    Garg, Abhinav
    Kim, Chanwoo
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 989 - 995
  • [38] Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors
    Wang, Longshaokan
    Fazel-zarandi, Maryam
    Tiwari, Aditya
    Matsoukas, Spyros
    Polymenakos, Lazaros
    NLP FOR CONVERSATIONAL AI, 2020, : 63 - 70
  • [39] Data augmentation using generative adversarial networks for robust speech recognition
    Qian, Yanmin
    Hu, Hu
    Tan, Tian
    SPEECH COMMUNICATION, 2019, 114 : 1 - 9
  • [40] Transfer learning for acoustic modeling of noise robust speech recognition
    Yi J.
    Tao J.
    Liu B.
    Wen Z.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (01): : 55 - 60