Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

被引:11
|
作者
Humayun, Mohammad Ali [1 ]
Hameed, Ibrahim A. [2 ]
Shah, Syed Muslim [1 ]
Khan, Sohaib Hassan [1 ]
Zafar, Irfan [1 ]
Bin Ahmed, Saad [3 ]
Shuja, Junaid [4 ]
机构
[1] Univ Engn & Technol Peshawar, Dept Elect Engn, Inst Commun Technol ICT Campus, Islamabad 44000, Pakistan
[2] Norwegian Univ Sci & Technol, Fac Informat Technol & Elect Engn, Dept ICT & Nat Sci, N-6001 Alesund, Norway
[3] Univ Teknol Malaysia, M JIIT, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia
[4] COSMATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22010, Pakistan
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 09期
关键词
speech recognition; locally linear embedding; label propagation; Maxout; low resource languages;
D O I
10.3390/app9091956
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] An active semi-supervised deep learning model for human activity recognition
    Bi, Haixia
    Perello-Nieto, Miquel
    Santos-Rodriguez, Raul
    Flach, Peter
    Craddock, Ian
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 14 (10) : 13049 - 13065
  • [22] An active semi-supervised deep learning model for human activity recognition
    Haixia Bi
    Miquel Perello-Nieto
    Raul Santos-Rodriguez
    Peter Flach
    Ian Craddock
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 13049 - 13065
  • [23] Semi-supervised Learning of Deep Difference Features for Facial Expression Recognition
    Xu, Can
    Xu, Ruyi
    Chen, Jingying
    Liu, Leyuan
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 245 - 254
  • [24] Deep Recurrent Semi-Supervised EEG Representation Learning for Emotion Recognition
    Zhang, Guangyi
    Teinad, Ali, I
    2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2021,
  • [25] Semi-Supervised Multichannel Speech Enhancement With a Deep Speech Prior
    Sekiguchi, Kouhei
    Bando, Yoshiaki
    Nugraha, Aditya Arie
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2197 - 2212
  • [26] Graph Based Semi-supervised Learning Methods Applied to Speech Recognition Problem
    Hoang Trang
    Tran, Loc Hoang
    NATURE OF COMPUTATION AND COMMUNICATION, 2015, 144 : 264 - 273
  • [27] SPEECH EMOTION RECOGNITION USING SEMI-SUPERVISED LEARNING WITH EFFICIENT LABELING STRATEGIES
    Zhu, Zhi
    Sato, Yoshinao
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 358 - 365
  • [28] Semi-supervised Ladder Networks for Speech Emotion Recognition
    Jian-Hua Tao
    Jian Huang
    Ya Li
    Zheng Lian
    Ming-Yue Niu
    International Journal of Automation and Computing, 2019, 16 : 437 - 448
  • [29] Semi-Supervised Speech Emotion Recognition With Ladder Networks
    Parthasarathy, Srinivas
    Busso, Carlos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2697 - 2709
  • [30] Semi-supervised Ladder Networks for Speech Emotion Recognition
    Tao, Jian-Hua
    Huang, Jian
    Li, Ya
    Lian, Zheng
    Niu, Ming-Yue
    INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2019, 16 (04) : 437 - 448