Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

被引:0
|
作者
Kuzdeuov, Askat [1 ]
Nurgaliyev, Shakhizat [1 ]
Turmakhan, Diana [1 ]
Laiyk, Nurkhan [1 ]
Varol, Huseyin Atakan [1 ]
机构
[1] Nazarbayev Univ, Inst Smart Syst & AI, Astana, Kazakhstan
关键词
Speech commands recognition; text-to-speech; Kazakh Speech Corpus; voice commands; data-centric AI;
D O I
10.1109/RAAI59955.2023.10601292
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Command Recognition (SCR) is rapidly gaining prominence due to its diverse applications, such as virtual assistants, smart homes, hands-free navigation, and voice-controlled industrial machinery. In this paper, we present a data-centric approach to creating SCR systems for low-resource languages, particularly focusing on the Kazakh language. By leveraging synthetic data generated by Text-to-Speech (TTS) and data extracted from a large-scale speech corpus, we successfully created the Kazakh language equivalent of the Google Speech Commands dataset. Moreover, we also compiled the Kazakh Speech Commands dataset with data collected from 119 participants. This dataset was used to benchmark the performance of the Keyword-MLP model trained using our synthetic dataset. The results showed that the model achieves 89.79% accuracy for the real-world data demonstrating the efficacy of our approach. Our work can serve as a recipe for creating customized speech command datasets, including for low-resource languages, obviating the need for laborious and costly human data collection.
引用
收藏
页码:286 / 291
页数:6
相关论文
共 50 条
  • [1] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
    Laptev, Aleksandr
    Korostik, Roman
    Svischev, Aleksey
    Andrusenko, Andrei
    Medennikov, Ivan
    Rybin, Sergey
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
  • [2] On building phonetically and prosodically rich speech corpus for text-to-speech synthesis
    Matousek, Jindrich
    Romportl, Jan
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 442 - +
  • [3] TERMINAL INTEGRATES SPEECH RECOGNITION AND TEXT-TO-SPEECH SYNTHESIS.
    Gauvain, Jean-Luc
    Gangolf, Jean-Jacques
    Speech Technology, 1983, 2 (01): : 25 - 28
  • [4] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
    Ni, Junrui
    Wang, Liming
    Gao, Heting
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    INTERSPEECH 2022, 2022, : 461 - 465
  • [5] IndicSpeech: Text-to-Speech Corpus for Indian Languages
    Srivastava, Nimisha
    Mukhopadhyay, Rudrabha
    Prajwal, K. R.
    Jawahar, C., V
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6417 - 6422
  • [6] RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
    Zandie, Rohola
    Mahoor, Mohammad H.
    Madsen, Julia
    Emamian, Eshrat S.
    INTERSPEECH 2021, 2021, : 2751 - 2755
  • [7] Speech Enhancement of Noisy and Reverberant Speech for Text-to-Speech
    Valentini-Botinhao, Cassia
    Yamagishi, Junichi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (08) : 1420 - 1433
  • [8] Design of a Yoruba Language Speech Corpus for the Purposes of Text-to-Speech (TTS) Synthesis
    Dagba, Theophile K.
    Aoga, John O. R.
    Fanou, Codjo C.
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT I, 2016, 9621 : 161 - 169
  • [9] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
    Vich, Robert
    Nouza, Jan
    Vondra, Martin
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +
  • [10] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
    Doukhan, David
    Rosset, Sophie
    Rilliard, Albert
    d'Alessandro, Christophe
    Adda-Decker, Martine
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010