Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

被引:0
|
作者
Kuzdeuov, Askat [1 ]
Nurgaliyev, Shakhizat [1 ]
Turmakhan, Diana [1 ]
Laiyk, Nurkhan [1 ]
Varol, Huseyin Atakan [1 ]
机构
[1] Nazarbayev Univ, Inst Smart Syst & AI, Astana, Kazakhstan
关键词
Speech commands recognition; text-to-speech; Kazakh Speech Corpus; voice commands; data-centric AI;
D O I
10.1109/RAAI59955.2023.10601292
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Command Recognition (SCR) is rapidly gaining prominence due to its diverse applications, such as virtual assistants, smart homes, hands-free navigation, and voice-controlled industrial machinery. In this paper, we present a data-centric approach to creating SCR systems for low-resource languages, particularly focusing on the Kazakh language. By leveraging synthetic data generated by Text-to-Speech (TTS) and data extracted from a large-scale speech corpus, we successfully created the Kazakh language equivalent of the Google Speech Commands dataset. Moreover, we also compiled the Kazakh Speech Commands dataset with data collected from 119 participants. This dataset was used to benchmark the performance of the Keyword-MLP model trained using our synthetic dataset. The results showed that the model achieves 89.79% accuracy for the real-world data demonstrating the efficacy of our approach. Our work can serve as a recipe for creating customized speech command datasets, including for low-resource languages, obviating the need for laborious and costly human data collection.
引用
收藏
页码:286 / 291
页数:6
相关论文
共 50 条
  • [31] Development of multi-lingual speech recognition and text-to-speech synthesis for automotive applications
    Deguchi, Y.
    Kagoshima, T.
    Hirabayashi, G.
    Kanazawa, H.
    VDI Berichte, 2002, (1728): : 233 - 240
  • [32] Applications of automatic speech recognition and text-to-speech technologies for hearing assessment: a scoping review
    Fatehifar, Mohsen
    Schlittenlacher, Josef
    Almufarrij, Ibrahim
    Wong, David
    Cootes, Tim
    Munro, Kevin J.
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2024,
  • [33] Corpus-based Malay Text-to-Speech Synthesis System
    Swee, Tan Tian
    Salleh, Sheikh Hussain Shaikh
    2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
  • [34] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
    O'Mahony, Johannah
    Lai, Catherine
    King, Simon
    INTERSPEECH 2022, 2022, : 3388 - 3392
  • [35] A new Korean corpus-based text-to-speech system
    Kim S.
    Lee Y.
    Hirose K.
    International Journal of Speech Technology, 2002, 5 (02) : 105 - 116
  • [36] End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders
    Masumura, Ryo
    Sato, Hiroshi
    Tanaka, Tomohiro
    Moriya, Takafumi
    Ijima, Yusuke
    Oba, Takanobu
    INTERSPEECH 2019, 2019, : 1606 - 1610
  • [37] NORMALIZATION OF TEXT MESSAGES FOR TEXT-TO-SPEECH
    Pennell, Deana L.
    Liu, Yang
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4842 - 4845
  • [38] JAPANESE TEXT-TO-SPEECH SYNTHESIZER
    NAGAKURA, K
    HAKODA, K
    KABEYA, K
    REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1988, 36 (05): : 451 - 457
  • [39] Slovenian text-to-speech system
    Sef, T
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL V: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 41 - 44
  • [40] Multilingual text-to-speech synthesis
    Black, AW
    Lenzo, KA
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764