Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition

被引:0
|
作者
Ranjan, Sumit [1 ]
Chakraborty, Rupayan [1 ]
Kopparapu, Sunil Kumar [1 ]
机构
[1] Tata Consultancy Serv Ltd, TCS Res, Bengaluru, India
来源
关键词
speech emotion recognition; noise robustness; selective data augmentation; reinforcement learning;
D O I
10.21437/Interspeech.2024-921
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is an indispensable component of any human machine interactions, and enables building empathetic voice user interfaces. Ability to accurately recognize emotion in noisy environments is important in practical scenarios when a person is interacting with a machine or an agent as in the case of a voice based call center. In this paper, we propose reinforcement learning (RL) based data augmentation technique to enable building a robust SER system. The reward function used in RL enables picking selective noises spread over different frequency bands for data augmentation. We show that the proposed RL based augmentation technique is superior to a recently proposed random selection based technique for the noise robust SER task. We use IEMOCAP dataset with four emotion classes for validating the proposed technique. Moreover, we test the noise robustness of SER system in both cross-corpus and cross-language scenarios.
引用
收藏
页码:1040 / 1044
页数:5
相关论文
共 50 条
  • [21] NOISE AWARE MANIFOLD LEARNING FOR ROBUST SPEECH RECOGNITION
    Tomar, Vikrant Singh
    Rose, Richard C.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7087 - 7091
  • [22] Towards Robust Speech-Based Emotion Recognition
    Tabatabaei, Talieh S.
    Krishnan, Sridhar
    2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
  • [23] Robust Representation Learning for Speech Emotion Recognition with Moment Exchange
    Cai, Yunrui
    Song, Changhe
    Tang, Boshi
    Dai, Dongyang
    Wu, Zhiyong
    Meng, Helen
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1002 - 1007
  • [24] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
  • [25] Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition
    Wang, Shijun
    Hemati, Hamed
    Gudnason, Jon
    Borth, Damian
    INTERSPEECH 2022, 2022, : 391 - 395
  • [26] Speech Emotion Recognition Based on Learning Automata in
    Motamed, Sara
    Setayeshi, Saeed
    Farhoudi, Zeinab
    Ahmadi, Ali
    JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE-JMCS, 2014, 12 (03): : 173 - 185
  • [27] MULTI-CONDITIONING AND DATA AUGMENTATION USING GENERATIVE NOISE MODEL FOR SPEECH EMOTION RECOGNITION IN NOISY CONDITIONS
    Tiwari, Upasana
    Soni, Meet
    Chakraborty, Rupayan
    Panda, Ashish
    Kopparapu, Sunil Kumar
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7194 - 7198
  • [28] Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data
    Pervaiz, Ayesha
    Hussain, Fawad
    Israr, Huma
    Tahir, Muhammad Ali
    Raja, Fawad Riasat
    Baloch, Naveed Khan
    Ishmanov, Farruh
    Zikria, Yousaf Bin
    SENSORS, 2020, 20 (08)
  • [29] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
    Al-onazi, Badriyya B.
    Nauman, Muhammad Asif
    Jahangir, Rashid
    Malik, Muhmmad Mohsin
    Alkhammash, Eman H.
    Elshewey, Ahmed M.
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [30] A PITCH BASED NOISE ESTIMATION TECHNIQUE FOR ROBUST SPEECH RECOGNITION WITH MISSING DATA
    Morales-Cordovilla, Juan A.
    Ma, Ning
    Sanchez, Victoria
    Carmona, Jose L.
    Peinado, Antonio M.
    Barker, Jon
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4808 - 4811