Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition

被引：0

作者：

Ranjan, Sumit ^{[1
]}

Chakraborty, Rupayan ^{[1
]}

Kopparapu, Sunil Kumar ^{[1
]}

机构：

[1] Tata Consultancy Serv Ltd, TCS Res, Bengaluru, India

来源：

INTERSPEECH 2024 | 2024年

关键词：

speech emotion recognition; noise robustness; selective data augmentation; reinforcement learning;

D O I：

10.21437/Interspeech.2024-921

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech emotion recognition (SER) is an indispensable component of any human machine interactions, and enables building empathetic voice user interfaces. Ability to accurately recognize emotion in noisy environments is important in practical scenarios when a person is interacting with a machine or an agent as in the case of a voice based call center. In this paper, we propose reinforcement learning (RL) based data augmentation technique to enable building a robust SER system. The reward function used in RL enables picking selective noises spread over different frequency bands for data augmentation. We show that the proposed RL based augmentation technique is superior to a recently proposed random selection based technique for the noise robust SER task. We use IEMOCAP dataset with four emotion classes for validating the proposed technique. Moreover, we test the noise robustness of SER system in both cross-corpus and cross-language scenarios.

引用

页码：1040 / 1044

页数：5

共 50 条

[41] Analysis for Using Noise as a Source of Data Augmentation for Dysarthric Speech Recognition
Nawroly, Sarkhell Sirwan
Popescu, Decebal
Celin, T. A. Mariya
Jeeva, M. P. Actlin
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
[42] LEARNING NOISE-INVARIANT REPRESENTATIONS FOR ROBUST SPEECH RECOGNITION
Liang, Davis
Huang, Zhiheng
Lipton, Zachary C.
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 56 - 63
[43] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
Song, Peng
Zheng, Wenming
Yu, Yanwei
Ou, Shifeng
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
[44] A SPARSITY BASED PREPROCESSING FOR NOISE ROBUST SPEECH RECOGNITION
Koniaris, Christos
Chatterjee, Saikat
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 513 - 518
[45] Practical Speech Emotion Recognition Based on Online Learning: From Acted Data to Elicited Data
Huang, Chengwei
Liang, Ruiyu
Wang, Qingyun
Xi, Ji
Zha, Cheng
Zhao, Li
MATHEMATICAL PROBLEMS IN ENGINEERING, 2013, 2013
[46] Robust Speech Recognition in the presence of noise using medical data
Athanaselis, Theologos
Bakamidis, Stelios
Giannopoulos, George
Dologlou, Ioannis
Fotinea, Evita
2008 IEEE INTERNATIONAL WORKSHOP ON IMAGING SYSTEMS AND TECHNIQUES, 2008, : 347 - 350
[47] Effect of Data Augmentation, Cross-Validation Methods in Robustness of Explainable Speech Based Emotion Recognition
Shinde, Ashwini S.
Patil, Vaishali V.
TRAITEMENT DU SIGNAL, 2024, 41 (03) : 1565 - 1574
[48] Speech emotion recognition based on an improved brain emotion learning model
Liu, Zhen-Tao
Xie, Qiao
Wu, Min
Cao, Wei-Hua
Mei, Ying
Mao, Jun-Wei
NEUROCOMPUTING, 2018, 309 : 145 - 156
[49] SENet-based speech emotion recognition using synthesis-style transfer data augmentation
Rajan R.
Hridya Raj T.V.
International Journal of Speech Technology, 2023, 26 (04) : 1017 - 1030
[50] Speech based Emotion Recognition using Machine Learning
Deshmukh, Girija
Gaonkar, Apurva
Golwalkar, Gauri
Kulkarni, Sukanya
PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 812 - 817

← 1 2 3 4 5 →