Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

被引：11

作者：

Humayun, Mohammad Ali ^{[1
]}

Hameed, Ibrahim A. ^{[2
]}

Shah, Syed Muslim ^{[1
]}

Khan, Sohaib Hassan ^{[1
]}

Zafar, Irfan ^{[1
]}

Bin Ahmed, Saad ^{[3
]}

Shuja, Junaid ^{[4
]}

机构：

[1] Univ Engn & Technol Peshawar, Dept Elect Engn, Inst Commun Technol ICT Campus, Islamabad 44000, Pakistan

[2] Norwegian Univ Sci & Technol, Fac Informat Technol & Elect Engn, Dept ICT & Nat Sci, N-6001 Alesund, Norway

[3] Univ Teknol Malaysia, M JIIT, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia

[4] COSMATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22010, Pakistan

来源：

APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 09期

关键词：

speech recognition; locally linear embedding; label propagation; Maxout; low resource languages;

D O I：

10.3390/app9091956

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size.

引用

页数：15

共 50 条

[21] An active semi-supervised deep learning model for human activity recognition
Bi, Haixia
Perello-Nieto, Miquel
Santos-Rodriguez, Raul
Flach, Peter
Craddock, Ian
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 14 (10) : 13049 - 13065
[22] An active semi-supervised deep learning model for human activity recognition
Haixia Bi
Miquel Perello-Nieto
Raul Santos-Rodriguez
Peter Flach
Ian Craddock
Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 13049 - 13065
[23] Semi-supervised Learning of Deep Difference Features for Facial Expression Recognition
Xu, Can
Xu, Ruyi
Chen, Jingying
Liu, Leyuan
PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 245 - 254
[24] Deep Recurrent Semi-Supervised EEG Representation Learning for Emotion Recognition
Zhang, Guangyi
Teinad, Ali, I
2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2021,
[25] Semi-Supervised Multichannel Speech Enhancement With a Deep Speech Prior
Sekiguchi, Kouhei
Bando, Yoshiaki
Nugraha, Aditya Arie
Yoshii, Kazuyoshi
Kawahara, Tatsuya
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2197 - 2212
[26] Graph Based Semi-supervised Learning Methods Applied to Speech Recognition Problem
Hoang Trang
Tran, Loc Hoang
NATURE OF COMPUTATION AND COMMUNICATION, 2015, 144 : 264 - 273
[27] SPEECH EMOTION RECOGNITION USING SEMI-SUPERVISED LEARNING WITH EFFICIENT LABELING STRATEGIES
Zhu, Zhi
Sato, Yoshinao
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 358 - 365
[28] Semi-supervised Ladder Networks for Speech Emotion Recognition
Jian-Hua Tao
Jian Huang
Ya Li
Zheng Lian
Ming-Yue Niu
International Journal of Automation and Computing, 2019, 16 : 437 - 448
[29] Semi-Supervised Speech Emotion Recognition With Ladder Networks
Parthasarathy, Srinivas
Busso, Carlos
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2697 - 2709
[30] Semi-supervised Ladder Networks for Speech Emotion Recognition
Tao, Jian-Hua
Huang, Jian
Li, Ya
Lian, Zheng
Niu, Ming-Yue
INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2019, 16 (04) : 437 - 448

← 1 2 3 4 5 →