Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

被引:11
|
作者
Humayun, Mohammad Ali [1 ]
Hameed, Ibrahim A. [2 ]
Shah, Syed Muslim [1 ]
Khan, Sohaib Hassan [1 ]
Zafar, Irfan [1 ]
Bin Ahmed, Saad [3 ]
Shuja, Junaid [4 ]
机构
[1] Univ Engn & Technol Peshawar, Dept Elect Engn, Inst Commun Technol ICT Campus, Islamabad 44000, Pakistan
[2] Norwegian Univ Sci & Technol, Fac Informat Technol & Elect Engn, Dept ICT & Nat Sci, N-6001 Alesund, Norway
[3] Univ Teknol Malaysia, M JIIT, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia
[4] COSMATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22010, Pakistan
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 09期
关键词
speech recognition; locally linear embedding; label propagation; Maxout; low resource languages;
D O I
10.3390/app9091956
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Semi-Supervised Speech Emotion Recognition with Ladder Networks
    Parthasarathy, Srinivas
    Busso, Carlos
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2020, 28 : 2697 - 2709
  • [32] Semi-supervised Ladder Networks for Speech Emotion Recognition
    Jian-Hua Tao
    Jian Huang
    Ya Li
    Zheng Lian
    Ming-Yue Niu
    International Journal of Automation and Computing, 2019, 16 (04) : 437 - 448
  • [33] CONTRASTIVE SIAMESE NETWORK FOR SEMI-SUPERVISED SPEECH RECOGNITION
    Khorram, Soheil
    Kim, Jaeyoung
    Tripathi, Anshuman
    Lu, Han
    Zhang, Qian
    Sak, Hasim
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7207 - 7211
  • [34] Semi-Supervised End-to-End Speech Recognition
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Ogawa, Atsunori
    Delcroix, Marc
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6
  • [35] A novel semi-supervised learning for face recognition
    Gao, Quanxue
    Huang, Yunfang
    Gao, Xinbo
    Shen, Weiguo
    Zhang, Hailin
    NEUROCOMPUTING, 2015, 152 : 69 - 76
  • [36] SEMI-SUPERVISED LEARNING FOR MUSICAL INSTRUMENT RECOGNITION
    Diment, Aleksandr
    Heittola, Toni
    Virtanen, Tuomas
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [37] Image Recognition and Analysis of Intrauterine Residues Based on Deep Learning and Semi-Supervised Learning
    Tao, Tao
    Liu, Kan
    Wang, Li
    Wu, Haiying
    IEEE ACCESS, 2020, 8 : 162785 - 162799
  • [38] DEEP NEURAL NETWORK FEATURES AND SEMI-SUPERVISED TRAINING FOR LOW RESOURCE SPEECH RECOGNITION
    Thomas, Samuel
    Seltzer, Michael L.
    Church, Kenneth
    Hermansky, Hynek
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6704 - 6708
  • [39] Semi-supervised Learning Regularized by Adversarial Perturbation and Diversity Maximization
    Liu, Peng
    Zheng, Guoyan
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2021, 2021, 12966 : 199 - 208
  • [40] Regularized Semi-Supervised Metric Learning with Latent Structure Preserved
    Wang, Qianying
    Lu, Ming
    Li, Meng
    Guan, Fei
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2021, 20 (02)