In domain training data augmentation on noise robust Punjabi Children speech recognition

被引:0
|
作者
Virender Kadyan
Puneet Bawa
Taniya Hasija
机构
[1] University of Petroleum and Energy Studies (UPES),Speech and Language Research Centre, School of Computer Science
[2] Chitkara University Institute of Engineering and Technology,Centre of Excellence for Speech and Multimodal Laboratory
[3] Chitkara University,undefined
关键词
Mel frequency-Gammatone frequency cepstral coefficient (MF-GFCC); Vocal tract length normalization (VTLN); Data augmentation; Feature warping;
D O I
暂无
中图分类号
学科分类号
摘要
For building a successful automatic speech recognition (ASR) engine large training data is required. It increases training complexity and become impossible for less resource language like Punjabi which have zero children corpus. Consequently, the issue of data scarcity, and small vocal length of children speakers also degrades the system performance under limited data conditions. Unfortunately, Punjabi is a tonal language and building an optimized ASR for such a language is near impossible. In this paper, we have explored fused feature extraction approach to handle large training complexity using mel frequency-gammatone frequency cepstral coefficient (MF-GFCC) technique through feature warping method. The efforts have been made to develop children’s ASR engine using data augmentation on limited data scenarios. For that purpose, we have studied in-domain data augmentation that artificially combined noisy and clean corpus to overcome the issue of data scarcity in train set. The combined dataset is processed with a fused feature extraction approach. Apart, the tonal characteristics and child vocal length issues are also overcome by inducing pitch features and train normalization strategy using vocal tract length normalization (VTLN) approach. In addition to that, combined augmented and original speech signals are noted to reduce the Word error rate (WER) performance with larger relative improvement (RI) of 20.59% on noisy and 19.39% on clean environment conditions using hybrid MF-GFCC approach than that on conventional Mel Frequency Cepstral Coefficient (MFCC) and Gammatone Frequency Cepstral Coefficient (GFCC) based ASR systems.
引用
收藏
页码:2705 / 2721
页数:16
相关论文
共 50 条
  • [1] In domain training data augmentation on noise robust Punjabi Children speech recognition
    Kadyan, Virender
    Bawa, Puneet
    Hasija, Taniya
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (5) : 2705 - 2721
  • [2] Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions
    Bawa, Puneet
    Kadyan, Virender
    APPLIED ACOUSTICS, 2021, 175
  • [3] Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data
    Pervaiz, Ayesha
    Hussain, Fawad
    Israr, Huma
    Tahir, Muhammad Ali
    Raja, Fawad Riasat
    Baloch, Naveed Khan
    Ishmanov, Farruh
    Zikria, Yousaf Bin
    SENSORS, 2020, 20 (08)
  • [4] Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors
    Wang, Longshaokan
    Fazel-zarandi, Maryam
    Tiwari, Aditya
    Matsoukas, Spyros
    Polymenakos, Lazaros
    NLP FOR CONVERSATIONAL AI, 2020, : 63 - 70
  • [5] Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition
    Ranjan, Sumit
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    INTERSPEECH 2024, 2024, : 1040 - 1044
  • [6] GENERATIVE ADVERSARIAL NETWORKS BASED DATA AUGMENTATION FOR NOISE ROBUST SPEECH RECOGNITION
    Hu, Hu
    Tan, Tian
    Qian, Yanmin
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5044 - 5048
  • [7] Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system
    Kadyan, Virender
    Bala, Shashi
    Bawa, Puneet
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 473 - 481
  • [8] Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system
    Virender Kadyan
    Shashi Bala
    Puneet Bawa
    International Journal of Speech Technology, 2021, 24 : 473 - 481
  • [9] Training Augmentation with Adversarial Examples for Robust Speech Recognition
    Sun, Sining
    Yeh, Ching-Feng
    Ostendorf, Mari
    Hwang, Mei-Yuh
    Xie, Lei
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2404 - 2408
  • [10] A STUDY ON DATA AUGMENTATION OF REVERBERANT SPEECH FOR ROBUST SPEECH RECOGNITION
    Ko, Tom
    Peddinti, Vijayaditya
    Povey, Daniel
    Seltzer, Michael L.
    Khudanpur, Sanjeev
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5220 - 5224