Tibetan Multi-Dialect Speech and Dialect Identity Recognition

被引:7
|
作者
Zhao, Yue [1 ]
Yue, Jianjian [1 ]
Song, Wei [1 ]
Xu, Xiaona [1 ]
Li, Xiali [1 ]
Wu, Licheng [1 ]
Ji, Qiang [2 ]
机构
[1] Minzu Univ China, Sch Informat & Engn, Beijing 100081, Peoples R China
[2] Rensselaer Polytech Inst, JEC 7004, Troy, NY 12180 USA
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2019年 / 60卷 / 03期
关键词
Tibetan multi-dialect speech recognition; dialect identification; multi-task learning; wavenet model;
D O I
10.32604/cmc.2019.05636
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tibetan language has very limited resource for conventional automatic speech recognition so far. It lacks of enough data, sub-word unit, lexicons and word inventories for some dialects. And speech content recognition and dialect classification have been treated as two independent tasks and modeled respectively in most prior works. But the two tasks are highly correlated. In this paper, we present a multi-task WaveNet model to perform simultaneous Tibetan multi-dialect speech recognition and dialect identification. It avoids processing the pronunciation dictionary and word segmentation for new dialects, while, in the meantime, allows training speech recognition and dialect identification in a single model. The experimental results show our method can simultaneously recognize speech content for different Tibetan dialects and identify the dialect with high accuracy using a unified model. The dialect information used in output for training can improve multi-dialect speech recognition accuracy, and the low-resource dialects got higher speech content recognition rate and dialect classification accuracy by multi-dialect and multi-task recognition model than task-specific models.
引用
收藏
页码:1223 / 1235
页数:13
相关论文
共 50 条
  • [1] An open speech resource for Tibetan multi-dialect and multitask recognition
    Zhao, Yue
    Xu, Xiaona
    Yue, Jianjian
    Song, Wei
    Li, Xiali
    Wu, Licheng
    Ji, Qiang
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 22 (2-3) : 297 - 304
  • [2] Multi-Dialect Arabic Speech Recognition
    Ali, Abbas Raza
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [3] Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition
    Liu, Yigang
    Zhao, Yue
    Xu, Xiaona
    Xu, Liang
    Zhang, Xubei
    Ji, Qiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [4] Chinese Multi-Dialect Speech Recognition Based on Instruction Tuning
    Ding, Timin
    Sun, Kai
    Zhang, Xu
    Yu, Jian
    Huang, Degen
    FOURTH SYMPOSIUM ON PATTERN RECOGNITION AND APPLICATIONS, SPRA 2023, 2024, 13162
  • [5] Global RNN Transducer Models For Multi-dialect Speech Recognition
    Fukuda, Takashi
    Thomas, Samuel
    Suzuki, Masayuki
    Kurata, Gakuto
    Saon, George
    Kingsbury, Brian
    INTERSPEECH 2022, 2022, : 3138 - 3142
  • [6] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [7] A HIGHLY ADAPTIVE ACOUSTIC MODEL FOR ACCURATE MULTI-DIALECT SPEECH RECOGNITION
    Yoo, Sanghyun
    Song, Inchul
    Bengio, Yoshua
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5716 - 5720
  • [8] MULTI-DIALECT SPEECH RECOGNITION IN ENGLISH USING ATTENTION ON ENSEMBLE OF EXPERTS
    Das, Amit
    Kumar, Kshitiz
    Wu, Jian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6244 - 6248
  • [9] Breaking the Corpus Bottleneck for Multi-dialect Speech Recognition with Flexible Adapters
    Deng, Tengyue
    Wei, Jianguo
    Yang, Jiahao
    Guo, Minghao
    Ke, Wenjun
    Yang, Xiaokang
    Lu, Wenhuan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 3 - 15
  • [10] MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL
    Li, Bo
    Sainath, Tara N.
    Sim, Khe Chai
    Bacchiani, Michiel
    Weinstein, Eugene
    Nguyen, Patrick
    Chen, Zhifeng
    Wu, Yonghui
    Rao, Kanishka
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4749 - 4753