Development of Language Models for Continuous Uzbek Speech Recognition System

被引：6

作者：

Mukhamadiyev, Abdinabi ^{[1
]}

Mukhiddinov, Mukhriddin ^{[1
]}

Khujayarov, Ilyos ^{[2
]}

Ochilov, Mannon ^{[3
]}

Cho, Jinsoo ^{[1
]}

机构：

[1] Gachon Univ, Dept Comp Engn, Seongnam Si 13120, South Korea

[2] Tashkent Univ Informat Technol, Dept Informat Technol, Samarkand Branch, Tashkent 140100, Uzbekistan

[3] Tashkent Univ Informat Technol, Dept Artificial Intelligence, Tashkent 100200, Uzbekistan

来源：

SENSORS | 2023年 / 23卷 / 03期

基金：

新加坡国家研究基金会;

关键词：

language model; Uzbek speech; recurrent neural networks; automatic speech recognition; neural networks; character-based language models; word-based language models;

D O I：

10.3390/s23031145

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Automatic speech recognition systems with a large vocabulary and other natural language processing applications cannot operate without a language model. Most studies on pre-trained language models have focused on more popular languages such as English, Chinese, and various European languages, but there is no publicly available Uzbek speech dataset. Therefore, language models of low-resource languages need to be studied and created. The objective of this study is to address this limitation by developing a low-resource language model for the Uzbek language and understanding linguistic occurrences. We proposed the Uzbek language model named UzLM by examining the performance of statistical and neural-network-based language models that account for the unique features of the Uzbek language. Our Uzbek-specific linguistic representation allows us to construct more robust UzLM, utilizing 80 million words from various sources while using the same or fewer training words, as applied in previous studies. Roughly sixty-eight thousand different words and 15 million sentences were collected for the creation of this corpus. The experimental results of our tests on the continuous recognition of Uzbek speech show that, compared with manual encoding, the use of neural-network-based language models reduced the character error rate to 5.26%.

引用

页数：22

共 50 条

[21] DNN based continuous speech recognition system of Punjabi language on Kaldi toolkit
Jyoti Guglani
A. N. Mishra
International Journal of Speech Technology, 2021, 24 : 41 - 45
[22] A study of neural network Russian language models for automatic continuous speech recognition systems
Kipyatkova, I. S.
Karpov, A. A.
AUTOMATION AND REMOTE CONTROL, 2017, 78 (05) : 858 - 867
[23] A study of neural network Russian language models for automatic continuous speech recognition systems
I. S. Kipyatkova
A. A. Karpov
Automation and Remote Control, 2017, 78 : 858 - 867
[24] Development of Continuous Automatic Speech Recognition System for Controlling of MAVs through Natural Speech
Lakshmi, Sandhya R.
Veena, S.
Reddy, Roja B.
2017 INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN ELECTRONICS AND COMMUNICATION TECHNOLOGY (ICRAECT), 2017, : 144 - 148
[25] PHONEME-BASED CONTINUOUS SPEECH RECOGNITION RESULTS FOR DIFFERENT LANGUAGE MODELS IN THE 1000-WORD SPICOS SYSTEM
NEY, H
PAESELER, A
SPEECH COMMUNICATION, 1988, 7 (04) : 367 - 374
[26] Gaussian mixture clustering and language adaptation for the development of a new language speech recognition system
Chatzichrisafis, Nikos
Diakoloukas, Vassilios
Digalakis, Vassilios
Harizakis, Costas
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 928 - 938
[27] MODELS OF CONTINUOUS SPEECH RECOGNITION AND THE CONTENTS OF THE VOCABULARY
MCQUEEN, JM
CUTLER, A
BRISCOE, T
NORRIS, D
LANGUAGE AND COGNITIVE PROCESSES, 1995, 10 (3-4): : 309 - 331
[28] Continuous Speech Recognition System for Chhattisgarhi
Londhe, N. D.
Kshirsagar, G. B.
2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, : 365 - 369
[29] Gaussian mixture language models for speech recognition
Afify, Mohamed
Siohan, Olivier
Sarikaya, Ruhi
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 29 - +
[30] Improving language models for radiology speech recognition
Paulett, John M.
Langlotz, Curtis P.
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (01) : 53 - 58

← 1 2 3 4 5 →