Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition

被引:0
|
作者
Ranjan, Sumit [1 ]
Chakraborty, Rupayan [1 ]
Kopparapu, Sunil Kumar [1 ]
机构
[1] Tata Consultancy Serv Ltd, TCS Res, Bengaluru, India
来源
关键词
speech emotion recognition; noise robustness; selective data augmentation; reinforcement learning;
D O I
10.21437/Interspeech.2024-921
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is an indispensable component of any human machine interactions, and enables building empathetic voice user interfaces. Ability to accurately recognize emotion in noisy environments is important in practical scenarios when a person is interacting with a machine or an agent as in the case of a voice based call center. In this paper, we propose reinforcement learning (RL) based data augmentation technique to enable building a robust SER system. The reward function used in RL enables picking selective noises spread over different frequency bands for data augmentation. We show that the proposed RL based augmentation technique is superior to a recently proposed random selection based technique for the noise robust SER task. We use IEMOCAP dataset with four emotion classes for validating the proposed technique. Moreover, we test the noise robustness of SER system in both cross-corpus and cross-language scenarios.
引用
收藏
页码:1040 / 1044
页数:5
相关论文
共 50 条
  • [41] Analysis for Using Noise as a Source of Data Augmentation for Dysarthric Speech Recognition
    Nawroly, Sarkhell Sirwan
    Popescu, Decebal
    Celin, T. A. Mariya
    Jeeva, M. P. Actlin
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
  • [42] LEARNING NOISE-INVARIANT REPRESENTATIONS FOR ROBUST SPEECH RECOGNITION
    Liang, Davis
    Huang, Zhiheng
    Lipton, Zachary C.
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 56 - 63
  • [43] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
    Song, Peng
    Zheng, Wenming
    Yu, Yanwei
    Ou, Shifeng
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
  • [44] A SPARSITY BASED PREPROCESSING FOR NOISE ROBUST SPEECH RECOGNITION
    Koniaris, Christos
    Chatterjee, Saikat
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 513 - 518
  • [45] Practical Speech Emotion Recognition Based on Online Learning: From Acted Data to Elicited Data
    Huang, Chengwei
    Liang, Ruiyu
    Wang, Qingyun
    Xi, Ji
    Zha, Cheng
    Zhao, Li
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2013, 2013
  • [46] Robust Speech Recognition in the presence of noise using medical data
    Athanaselis, Theologos
    Bakamidis, Stelios
    Giannopoulos, George
    Dologlou, Ioannis
    Fotinea, Evita
    2008 IEEE INTERNATIONAL WORKSHOP ON IMAGING SYSTEMS AND TECHNIQUES, 2008, : 347 - 350
  • [47] Effect of Data Augmentation, Cross-Validation Methods in Robustness of Explainable Speech Based Emotion Recognition
    Shinde, Ashwini S.
    Patil, Vaishali V.
    TRAITEMENT DU SIGNAL, 2024, 41 (03) : 1565 - 1574
  • [48] Speech emotion recognition based on an improved brain emotion learning model
    Liu, Zhen-Tao
    Xie, Qiao
    Wu, Min
    Cao, Wei-Hua
    Mei, Ying
    Mao, Jun-Wei
    NEUROCOMPUTING, 2018, 309 : 145 - 156
  • [49] SENet-based speech emotion recognition using synthesis-style transfer data augmentation
    Rajan R.
    Hridya Raj T.V.
    International Journal of Speech Technology, 2023, 26 (04) : 1017 - 1030
  • [50] Speech based Emotion Recognition using Machine Learning
    Deshmukh, Girija
    Gaonkar, Apurva
    Golwalkar, Gauri
    Kulkarni, Sukanya
    PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 812 - 817