Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

被引:0
|
作者
Kuzdeuov, Askat [1 ]
Nurgaliyev, Shakhizat [1 ]
Turmakhan, Diana [1 ]
Laiyk, Nurkhan [1 ]
Varol, Huseyin Atakan [1 ]
机构
[1] Nazarbayev Univ, Inst Smart Syst & AI, Astana, Kazakhstan
关键词
Speech commands recognition; text-to-speech; Kazakh Speech Corpus; voice commands; data-centric AI;
D O I
10.1109/RAAI59955.2023.10601292
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Command Recognition (SCR) is rapidly gaining prominence due to its diverse applications, such as virtual assistants, smart homes, hands-free navigation, and voice-controlled industrial machinery. In this paper, we present a data-centric approach to creating SCR systems for low-resource languages, particularly focusing on the Kazakh language. By leveraging synthetic data generated by Text-to-Speech (TTS) and data extracted from a large-scale speech corpus, we successfully created the Kazakh language equivalent of the Google Speech Commands dataset. Moreover, we also compiled the Kazakh Speech Commands dataset with data collected from 119 participants. This dataset was used to benchmark the performance of the Keyword-MLP model trained using our synthetic dataset. The results showed that the model achieves 89.79% accuracy for the real-world data demonstrating the efficacy of our approach. Our work can serve as a recipe for creating customized speech command datasets, including for low-resource languages, obviating the need for laborious and costly human data collection.
引用
收藏
页码:286 / 291
页数:6
相关论文
共 50 条
  • [41] A Hakka text-to-speech system
    Yu, Hsiu-Min
    Hwang, Hsin-Te
    Lin, Dong-Yi
    Chen, Sin-Horng
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 241 - +
  • [42] TEXT-TO-SPEECH CONVERSION TECHNOLOGY
    OMALLEY, MH
    COMPUTER, 1990, 23 (08) : 17 - 23
  • [43] Improving text-to-speech synthesis
    Tatham, M
    Lewis, E
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
  • [44] Latvian Text-to-Speech Synthesizer
    Pinnis, Marcis
    Auzina, Ilze
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2010, 219 : 69 - 72
  • [45] Towards Universal Text-to-Speech
    Yang, Jingzhou
    He, Lei
    INTERSPEECH 2020, 2020, : 3171 - 3175
  • [46] UNSUPERVISED POLYGLOT TEXT-TO-SPEECH
    Nachmani, Eliya
    Wolf, Lior
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7055 - 7059
  • [47] An introduction to text-to-speech synthesis
    Fitzpatrick, E
    COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323
  • [48] A TEXT-TO-SPEECH CONVERSION SYSTEM
    KLATT, DH
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1982, 184 (SEP): : 11 - CINF
  • [49] Text-to-speech for Slovak language
    Caky, P
    Klimo, M
    Mihálik, I
    Mladsik, R
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 291 - 298
  • [50] Text-to-speech system for Danish
    1600, Publ by Elsevier Science Publishers B.V., Amsterdam, Neth