In domain training data augmentation on noise robust Punjabi Children speech recognition

被引：0

作者：

Virender Kadyan

Puneet Bawa

Taniya Hasija

机构：

[1] University of Petroleum and Energy Studies (UPES),Speech and Language Research Centre, School of Computer Science

[2] Chitkara University Institute of Engineering and Technology,Centre of Excellence for Speech and Multimodal Laboratory

[3] Chitkara University,undefined

来源：

Journal of Ambient Intelligence and Humanized Computing | 2022年 / 13卷

关键词：

Mel frequency-Gammatone frequency cepstral coefficient (MF-GFCC); Vocal tract length normalization (VTLN); Data augmentation; Feature warping;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

For building a successful automatic speech recognition (ASR) engine large training data is required. It increases training complexity and become impossible for less resource language like Punjabi which have zero children corpus. Consequently, the issue of data scarcity, and small vocal length of children speakers also degrades the system performance under limited data conditions. Unfortunately, Punjabi is a tonal language and building an optimized ASR for such a language is near impossible. In this paper, we have explored fused feature extraction approach to handle large training complexity using mel frequency-gammatone frequency cepstral coefficient (MF-GFCC) technique through feature warping method. The efforts have been made to develop children’s ASR engine using data augmentation on limited data scenarios. For that purpose, we have studied in-domain data augmentation that artificially combined noisy and clean corpus to overcome the issue of data scarcity in train set. The combined dataset is processed with a fused feature extraction approach. Apart, the tonal characteristics and child vocal length issues are also overcome by inducing pitch features and train normalization strategy using vocal tract length normalization (VTLN) approach. In addition to that, combined augmented and original speech signals are noted to reduce the Word error rate (WER) performance with larger relative improvement (RI) of 20.59% on noisy and 19.39% on clean environment conditions using hybrid MF-GFCC approach than that on conventional Mel Frequency Cepstral Coefficient (MFCC) and Gammatone Frequency Cepstral Coefficient (GFCC) based ASR systems.

引用

页码：2705 / 2721

页数：16

共 50 条

[21] Improving Turkish Telephone Speech Recognition with Data Augmentation and Out of Domain Data
Uslu, Zeynep Gulhan
Yildirim, Tulay
2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 176 - 179
[22] A COMPARISON OF STREAMING MODELS AND DATA AUGMENTATION METHODS FOR ROBUST SPEECH RECOGNITION
Kim, Jiyeon
Kumar, Mehul
Gowda, Dhananjaya
Garg, Abhinav
Kim, Chanwoo
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 989 - 995
[23] Data augmentation using generative adversarial networks for robust speech recognition
Qian, Yanmin
Hu, Hu
Tan, Tian
SPEECH COMMUNICATION, 2019, 114 : 1 - 9
[24] Analysis for Using Noise as a Source of Data Augmentation for Dysarthric Speech Recognition
Nawroly, Sarkhell Sirwan
Popescu, Decebal
Celin, T. A. Mariya
Jeeva, M. P. Actlin
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
[25] On Practical Aspects of Multi-condition Training Based on Augmentation for Reverberation-/Noise-Robust Speech Recognition
Malek, Jiri
Zdansky, Jindrich
TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 251 - 263
[26] Robust Speech Recognition in the presence of noise using medical data
Athanaselis, Theologos
Bakamidis, Stelios
Giannopoulos, George
Dologlou, Ioannis
Fotinea, Evita
2008 IEEE INTERNATIONAL WORKSHOP ON IMAGING SYSTEMS AND TECHNIQUES, 2008, : 347 - 350
[27] Matching training and test data distributions for robust speech recognition
Molau, S
Keysers, D
Ney, H
SPEECH COMMUNICATION, 2003, 41 (04) : 579 - 601
[28] Modulation Spectrum Augmentation for Robust Speech Recognition
Yan, Bi-Cheng
Liu, Shih-Hung
Chen, Berlin
PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION SCIENCE AND SYSTEM, AISS 2019, 2019,
[29] Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition
Sheng, Peiyao
Yang, Zhuolin
Hu, Hu
Tan, Tian
Qian, Yanmin
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 121 - 125
[30] Cepstral domain segmental feature vector normalization for noise robust speech recognition
Viikki, O
Laurila, K
SPEECH COMMUNICATION, 1998, 25 (1-3) : 133 - 147

← 1 2 3 4 5 →