Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

被引：0

作者：

Kuzdeuov, Askat ^{[1
]}

Nurgaliyev, Shakhizat ^{[1
]}

Turmakhan, Diana ^{[1
]}

Laiyk, Nurkhan ^{[1
]}

Varol, Huseyin Atakan ^{[1
]}

机构：

[1] Nazarbayev Univ, Inst Smart Syst & AI, Astana, Kazakhstan

来源：

2023 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND ARTIFICIAL INTELLIGENCE, RAAI 2023 | 2023年

关键词：

Speech commands recognition; text-to-speech; Kazakh Speech Corpus; voice commands; data-centric AI;

D O I：

10.1109/RAAI59955.2023.10601292

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech Command Recognition (SCR) is rapidly gaining prominence due to its diverse applications, such as virtual assistants, smart homes, hands-free navigation, and voice-controlled industrial machinery. In this paper, we present a data-centric approach to creating SCR systems for low-resource languages, particularly focusing on the Kazakh language. By leveraging synthetic data generated by Text-to-Speech (TTS) and data extracted from a large-scale speech corpus, we successfully created the Kazakh language equivalent of the Google Speech Commands dataset. Moreover, we also compiled the Kazakh Speech Commands dataset with data collected from 119 participants. This dataset was used to benchmark the performance of the Keyword-MLP model trained using our synthetic dataset. The results showed that the model achieves 89.79% accuracy for the real-world data demonstrating the efficacy of our approach. Our work can serve as a recipe for creating customized speech command datasets, including for low-resource languages, obviating the need for laborious and costly human data collection.

引用

页码：286 / 291

页数：6

共 50 条

[41] A Hakka text-to-speech system
Yu, Hsiu-Min
Hwang, Hsin-Te
Lin, Dong-Yi
Chen, Sin-Horng
CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 241 - +
[42] TEXT-TO-SPEECH CONVERSION TECHNOLOGY
OMALLEY, MH
COMPUTER, 1990, 23 (08) : 17 - 23
[43] Improving text-to-speech synthesis
Tatham, M
Lewis, E
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
[44] Latvian Text-to-Speech Synthesizer
Pinnis, Marcis
Auzina, Ilze
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2010, 219 : 69 - 72
[45] Towards Universal Text-to-Speech
Yang, Jingzhou
He, Lei
INTERSPEECH 2020, 2020, : 3171 - 3175
[46] UNSUPERVISED POLYGLOT TEXT-TO-SPEECH
Nachmani, Eliya
Wolf, Lior
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7055 - 7059
[47] An introduction to text-to-speech synthesis
Fitzpatrick, E
COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323
[48] A TEXT-TO-SPEECH CONVERSION SYSTEM
KLATT, DH
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1982, 184 (SEP): : 11 - CINF
[49] Text-to-speech for Slovak language
Caky, P
Klimo, M
Mihálik, I
Mladsik, R
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 291 - 298
[50] Text-to-speech system for Danish
1600, Publ by Elsevier Science Publishers B.V., Amsterdam, Neth

← 1 2 3 4 5 →