In domain training data augmentation on noise robust Punjabi Children speech recognition

被引:0
|
作者
Virender Kadyan
Puneet Bawa
Taniya Hasija
机构
[1] University of Petroleum and Energy Studies (UPES),Speech and Language Research Centre, School of Computer Science
[2] Chitkara University Institute of Engineering and Technology,Centre of Excellence for Speech and Multimodal Laboratory
[3] Chitkara University,undefined
关键词
Mel frequency-Gammatone frequency cepstral coefficient (MF-GFCC); Vocal tract length normalization (VTLN); Data augmentation; Feature warping;
D O I
暂无
中图分类号
学科分类号
摘要
For building a successful automatic speech recognition (ASR) engine large training data is required. It increases training complexity and become impossible for less resource language like Punjabi which have zero children corpus. Consequently, the issue of data scarcity, and small vocal length of children speakers also degrades the system performance under limited data conditions. Unfortunately, Punjabi is a tonal language and building an optimized ASR for such a language is near impossible. In this paper, we have explored fused feature extraction approach to handle large training complexity using mel frequency-gammatone frequency cepstral coefficient (MF-GFCC) technique through feature warping method. The efforts have been made to develop children’s ASR engine using data augmentation on limited data scenarios. For that purpose, we have studied in-domain data augmentation that artificially combined noisy and clean corpus to overcome the issue of data scarcity in train set. The combined dataset is processed with a fused feature extraction approach. Apart, the tonal characteristics and child vocal length issues are also overcome by inducing pitch features and train normalization strategy using vocal tract length normalization (VTLN) approach. In addition to that, combined augmented and original speech signals are noted to reduce the Word error rate (WER) performance with larger relative improvement (RI) of 20.59% on noisy and 19.39% on clean environment conditions using hybrid MF-GFCC approach than that on conventional Mel Frequency Cepstral Coefficient (MFCC) and Gammatone Frequency Cepstral Coefficient (GFCC) based ASR systems.
引用
收藏
页码:2705 / 2721
页数:16
相关论文
共 50 条
  • [21] Improving Turkish Telephone Speech Recognition with Data Augmentation and Out of Domain Data
    Uslu, Zeynep Gulhan
    Yildirim, Tulay
    2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 176 - 179
  • [22] A COMPARISON OF STREAMING MODELS AND DATA AUGMENTATION METHODS FOR ROBUST SPEECH RECOGNITION
    Kim, Jiyeon
    Kumar, Mehul
    Gowda, Dhananjaya
    Garg, Abhinav
    Kim, Chanwoo
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 989 - 995
  • [23] Data augmentation using generative adversarial networks for robust speech recognition
    Qian, Yanmin
    Hu, Hu
    Tan, Tian
    SPEECH COMMUNICATION, 2019, 114 : 1 - 9
  • [24] Analysis for Using Noise as a Source of Data Augmentation for Dysarthric Speech Recognition
    Nawroly, Sarkhell Sirwan
    Popescu, Decebal
    Celin, T. A. Mariya
    Jeeva, M. P. Actlin
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
  • [25] On Practical Aspects of Multi-condition Training Based on Augmentation for Reverberation-/Noise-Robust Speech Recognition
    Malek, Jiri
    Zdansky, Jindrich
    TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 251 - 263
  • [26] Robust Speech Recognition in the presence of noise using medical data
    Athanaselis, Theologos
    Bakamidis, Stelios
    Giannopoulos, George
    Dologlou, Ioannis
    Fotinea, Evita
    2008 IEEE INTERNATIONAL WORKSHOP ON IMAGING SYSTEMS AND TECHNIQUES, 2008, : 347 - 350
  • [27] Matching training and test data distributions for robust speech recognition
    Molau, S
    Keysers, D
    Ney, H
    SPEECH COMMUNICATION, 2003, 41 (04) : 579 - 601
  • [28] Modulation Spectrum Augmentation for Robust Speech Recognition
    Yan, Bi-Cheng
    Liu, Shih-Hung
    Chen, Berlin
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION SCIENCE AND SYSTEM, AISS 2019, 2019,
  • [29] Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition
    Sheng, Peiyao
    Yang, Zhuolin
    Hu, Hu
    Tan, Tian
    Qian, Yanmin
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 121 - 125
  • [30] Cepstral domain segmental feature vector normalization for noise robust speech recognition
    Viikki, O
    Laurila, K
    SPEECH COMMUNICATION, 1998, 25 (1-3) : 133 - 147