Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition

被引：0

作者：

Ranjan, Sumit ^{[1
]}

Chakraborty, Rupayan ^{[1
]}

Kopparapu, Sunil Kumar ^{[1
]}

机构：

[1] Tata Consultancy Serv Ltd, TCS Res, Bengaluru, India

来源：

INTERSPEECH 2024 | 2024年

关键词：

speech emotion recognition; noise robustness; selective data augmentation; reinforcement learning;

D O I：

10.21437/Interspeech.2024-921

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech emotion recognition (SER) is an indispensable component of any human machine interactions, and enables building empathetic voice user interfaces. Ability to accurately recognize emotion in noisy environments is important in practical scenarios when a person is interacting with a machine or an agent as in the case of a voice based call center. In this paper, we propose reinforcement learning (RL) based data augmentation technique to enable building a robust SER system. The reward function used in RL enables picking selective noises spread over different frequency bands for data augmentation. We show that the proposed RL based augmentation technique is superior to a recently proposed random selection based technique for the noise robust SER task. We use IEMOCAP dataset with four emotion classes for validating the proposed technique. Moreover, we test the noise robustness of SER system in both cross-corpus and cross-language scenarios.

引用

页码：1040 / 1044

页数：5

共 50 条

[1] GENERATIVE ADVERSARIAL NETWORKS BASED DATA AUGMENTATION FOR NOISE ROBUST SPEECH RECOGNITION
Hu, Hu
Tan, Tian
Qian, Yanmin
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5044 - 5048
[2] Data Augmentation Techniques for Speech Emotion Recognition and Deep Learning
Antonio Nicolas, Jose
de Lope, Javier
Grana, Manuel
BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II, 2022, 13259 : 279 - 288
[3] REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
Shen, Yih-Liang
Huang, Chao-Yuan
Wang, Syu-Siang
Tsao, Yu
Wang, Hsin-Min
Chi, Tai-Shih
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6750 - 6754
[4] Speech emotion recognition using data augmentation
V. M. Praseetha
P. P. Joby
International Journal of Speech Technology, 2022, 25 : 783 - 792
[5] Speech Emotion Recognition Using Data Augmentation
Kapoor, Tanisha
Ganguly, Arnaja
Rajeswari, D.
2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
[6] Speech emotion recognition using data augmentation
Praseetha, V. M.
Joby, P. P.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (4) : 783 - 792
[7] In domain training data augmentation on noise robust Punjabi Children speech recognition
Virender Kadyan
Puneet Bawa
Taniya Hasija
Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 2705 - 2721
[8] In domain training data augmentation on noise robust Punjabi Children speech recognition
Kadyan, Virender
Bawa, Puneet
Hasija, Taniya
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (5) : 2705 - 2721
[9] Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation
Tao, Huawei
Shan, Shuai
Hu, Ziyi
Zhu, Chunhua
Ge, Hongyi
ENTROPY, 2023, 25 (01)
[10] CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition
Bao, Fang
Neumann, Michael
Ngoc Thang Vu
INTERSPEECH 2019, 2019, : 2828 - 2832

← 1 2 3 4 5 →