Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

被引:11
|
作者
Humayun, Mohammad Ali [1 ]
Hameed, Ibrahim A. [2 ]
Shah, Syed Muslim [1 ]
Khan, Sohaib Hassan [1 ]
Zafar, Irfan [1 ]
Bin Ahmed, Saad [3 ]
Shuja, Junaid [4 ]
机构
[1] Univ Engn & Technol Peshawar, Dept Elect Engn, Inst Commun Technol ICT Campus, Islamabad 44000, Pakistan
[2] Norwegian Univ Sci & Technol, Fac Informat Technol & Elect Engn, Dept ICT & Nat Sci, N-6001 Alesund, Norway
[3] Univ Teknol Malaysia, M JIIT, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia
[4] COSMATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22010, Pakistan
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 09期
关键词
speech recognition; locally linear embedding; label propagation; Maxout; low resource languages;
D O I
10.3390/app9091956
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Semi-Supervised Learning for Spanish Speech Recognition Using Deep Neural Networks
    Rosario Campomanes-Alvarez, Blanca
    Quiros, Pelayo
    Fernandez, Bernardo
    APPLICATIONS OF INTELLIGENT SYSTEMS, 2018, 310 : 19 - 29
  • [2] Semi-supervised learning with regularized Laplacian
    Avrachenkov, K.
    Chebotarev, P.
    Mishenin, A.
    OPTIMIZATION METHODS & SOFTWARE, 2017, 32 (02): : 222 - 236
  • [3] DEEP CONTEXTUALIZED ACOUSTIC REPRESENTATIONS FOR SEMI-SUPERVISED SPEECH RECOGNITION
    Ling, Shaoshi
    Liu, Yuzong
    Salazar, Julian
    Kirchhoff, Katrin
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6429 - 6433
  • [4] USING COLLECTIVE INFORMATION IN SEMI-SUPERVISED LEARNING FOR SPEECH RECOGNITION
    Varadarajan, Balakrishnan
    Yu, Dong
    Deng, Li
    Acero, Alex
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4633 - +
  • [5] Deep Semi-Supervised Learning
    Hailat, Zeyad
    Komarichev, Artem
    Chen, Xue-Wen
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2154 - 2159
  • [6] Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions
    Chen, Ke
    Wang, Shihai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (01) : 129 - 143
  • [7] Semi-Supervised Learning of Speech Sounds
    Jansen, Aren
    Niyogi, Partha
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2264 - 2267
  • [8] Automatic Leaf Recognition Based on Deep Semi-Supervised Learning
    Wu H.
    Xiao F.
    Shi Z.
    Wen Z.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (10): : 1469 - 1478
  • [9] Confidence Measures in Speech Emotion Recognition Based on Semi-supervised Learning
    Deng, Jun
    Schuller, Bjoern
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2223 - 2226
  • [10] Speech Emotion Recognition Using Semi-supervised Learning with Ladder Networks
    Huang, Jian
    Li, Ya
    Tao, Jianhua
    Lian, Zheng
    Niu, Mingyue
    Yi, Jiangyan
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,