In domain training data augmentation on noise robust Punjabi Children speech recognition

被引：0

作者：

Virender Kadyan

Puneet Bawa

Taniya Hasija

机构：

[1] University of Petroleum and Energy Studies (UPES),Speech and Language Research Centre, School of Computer Science

[2] Chitkara University Institute of Engineering and Technology,Centre of Excellence for Speech and Multimodal Laboratory

[3] Chitkara University,undefined

来源：

Journal of Ambient Intelligence and Humanized Computing | 2022年 / 13卷

关键词：

Mel frequency-Gammatone frequency cepstral coefficient (MF-GFCC); Vocal tract length normalization (VTLN); Data augmentation; Feature warping;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

For building a successful automatic speech recognition (ASR) engine large training data is required. It increases training complexity and become impossible for less resource language like Punjabi which have zero children corpus. Consequently, the issue of data scarcity, and small vocal length of children speakers also degrades the system performance under limited data conditions. Unfortunately, Punjabi is a tonal language and building an optimized ASR for such a language is near impossible. In this paper, we have explored fused feature extraction approach to handle large training complexity using mel frequency-gammatone frequency cepstral coefficient (MF-GFCC) technique through feature warping method. The efforts have been made to develop children’s ASR engine using data augmentation on limited data scenarios. For that purpose, we have studied in-domain data augmentation that artificially combined noisy and clean corpus to overcome the issue of data scarcity in train set. The combined dataset is processed with a fused feature extraction approach. Apart, the tonal characteristics and child vocal length issues are also overcome by inducing pitch features and train normalization strategy using vocal tract length normalization (VTLN) approach. In addition to that, combined augmented and original speech signals are noted to reduce the Word error rate (WER) performance with larger relative improvement (RI) of 20.59% on noisy and 19.39% on clean environment conditions using hybrid MF-GFCC approach than that on conventional Mel Frequency Cepstral Coefficient (MFCC) and Gammatone Frequency Cepstral Coefficient (GFCC) based ASR systems.

引用

页码：2705 / 2721

页数：16

共 50 条

[1] In domain training data augmentation on noise robust Punjabi Children speech recognition
Kadyan, Virender
Bawa, Puneet
Hasija, Taniya
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (5) : 2705 - 2721
[2] Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions
Bawa, Puneet
Kadyan, Virender
APPLIED ACOUSTICS, 2021, 175
[3] Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data
Pervaiz, Ayesha
Hussain, Fawad
Israr, Huma
Tahir, Muhammad Ali
Raja, Fawad Riasat
Baloch, Naveed Khan
Ishmanov, Farruh
Zikria, Yousaf Bin
SENSORS, 2020, 20 (08)
[4] Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors
Wang, Longshaokan
Fazel-zarandi, Maryam
Tiwari, Aditya
Matsoukas, Spyros
Polymenakos, Lazaros
NLP FOR CONVERSATIONAL AI, 2020, : 63 - 70
[5] Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition
Ranjan, Sumit
Chakraborty, Rupayan
Kopparapu, Sunil Kumar
INTERSPEECH 2024, 2024, : 1040 - 1044
[6] GENERATIVE ADVERSARIAL NETWORKS BASED DATA AUGMENTATION FOR NOISE ROBUST SPEECH RECOGNITION
Hu, Hu
Tan, Tian
Qian, Yanmin
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5044 - 5048
[7] Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system
Kadyan, Virender
Bala, Shashi
Bawa, Puneet
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 473 - 481
[8] Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system
Virender Kadyan
Shashi Bala
Puneet Bawa
International Journal of Speech Technology, 2021, 24 : 473 - 481
[9] Training Augmentation with Adversarial Examples for Robust Speech Recognition
Sun, Sining
Yeh, Ching-Feng
Ostendorf, Mari
Hwang, Mei-Yuh
Xie, Lei
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2404 - 2408
[10] A STUDY ON DATA AUGMENTATION OF REVERBERANT SPEECH FOR ROBUST SPEECH RECOGNITION
Ko, Tom
Peddinti, Vijayaditya
Povey, Daniel
Seltzer, Michael L.
Khudanpur, Sanjeev
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5220 - 5224

← 1 2 3 4 5 →