Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system

被引:0
|
作者
Virender Kadyan
Shashi Bala
Puneet Bawa
机构
[1] University of Petroleum & Energy Studies (UPES),Department of Informatics, School of Computer Science
[2] Chitkara University Institute of Engineering and Technology,Centre of Excellence for Speech and Multimodal Laboratory
[3] Chitkara University,undefined
关键词
Tandem-NN; Data augmentation; Bottleneck features; Punjabi ASR; DNN-HMM;
D O I
暂无
中图分类号
学科分类号
摘要
Processing of low resource pre and post acoustic signals always faced the challenge of data scarcity in its training module. It’s difficult to obtain high system accuracy with limited corpora in train set which results into extraction of large discriminative feature vector. These vectors information are distorted due to acoustic mismatch occurs because of real environment and inter speaker variations. In this paper, context independent information of an input speech signal is pre-processed using bottleneck features and later in modeling phase Tandem-NN model has been employ to enhance system accuracy. Later to fulfill the requirement of train data issues, in-domain training augmentation is perform using fusion of original clean and artificially created modified train noisy data and to further boost this training data, tempo modification of input speech signal is perform with maintenance of its spectral envelope and pitch in corresponding input audio signal. Experimental result shows that a relative improvement of 13.53% is achieved in clean and 32.43% in noisy conditions with Tandem-NN system in comparison to that of baseline system respectively.
引用
收藏
页码:473 / 481
页数:8
相关论文
共 50 条
  • [41] Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition
    Bouselmi, G.
    Fohr, D.
    Illina, I.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1209 - +
  • [42] Enhancing accuracy of long contextual dependencies for Punjabi speech recognition system using deep LSTM
    Kadyan, Virender
    Dua, Mohit
    Dhiman, Poonam
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 517 - 527
  • [43] Enhancing accuracy of long contextual dependencies for Punjabi speech recognition system using deep LSTM
    Virender Kadyan
    Mohit Dua
    Poonam Dhiman
    International Journal of Speech Technology, 2021, 24 : 517 - 527
  • [44] An Experimental Study of Continuous Automatic Speech Recognition System Using MFCC with Reference to Punjabi Language
    Bassan, Nancy
    Kadyan, Virender
    RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 1, 2019, 707 : 267 - 275
  • [45] Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions
    Puneet Bawa
    Virender Kadyan
    Abinash Tripathy
    Thipendra P. Singh
    Complex & Intelligent Systems, 2023, 9 : 1 - 23
  • [46] Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions
    Bawa, Puneet
    Kadyan, Virender
    Tripathy, Abinash
    Singh, Thipendra P.
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (01) : 1 - 23
  • [47] Optimizing Feature Extraction Techniques Constituting Phone Based Modelling on Connected Words for Punjabi Automatic Speech Recognition
    Kaur, Arshpreet
    Singh, Amitoj
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2104 - 2108
  • [48] Research on English speech recognition system and training enhancement based on bat algorithm and acoustic model inspection
    Yang, Xi
    Li, Ling
    SOFT COMPUTING, 2023,
  • [49] Acoustic data augmentation for Mandarin-English code-switching speech recognition
    Long, Yanhua
    Li, Yijie
    Zhang, Qiaozheng
    Wei, Shuang
    Ye, Hong
    Yang, Jichen
    APPLIED ACOUSTICS, 2020, 161
  • [50] Unsupervised training of acoustic models for large vocabulary continuous speech recognition
    Wessel, F
    Ney, H
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 307 - 310