Tibetan Multi-Dialect Speech and Dialect Identity Recognition

被引:7
|
作者
Zhao, Yue [1 ]
Yue, Jianjian [1 ]
Song, Wei [1 ]
Xu, Xiaona [1 ]
Li, Xiali [1 ]
Wu, Licheng [1 ]
Ji, Qiang [2 ]
机构
[1] Minzu Univ China, Sch Informat & Engn, Beijing 100081, Peoples R China
[2] Rensselaer Polytech Inst, JEC 7004, Troy, NY 12180 USA
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2019年 / 60卷 / 03期
关键词
Tibetan multi-dialect speech recognition; dialect identification; multi-task learning; wavenet model;
D O I
10.32604/cmc.2019.05636
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tibetan language has very limited resource for conventional automatic speech recognition so far. It lacks of enough data, sub-word unit, lexicons and word inventories for some dialects. And speech content recognition and dialect classification have been treated as two independent tasks and modeled respectively in most prior works. But the two tasks are highly correlated. In this paper, we present a multi-task WaveNet model to perform simultaneous Tibetan multi-dialect speech recognition and dialect identification. It avoids processing the pronunciation dictionary and word segmentation for new dialects, while, in the meantime, allows training speech recognition and dialect identification in a single model. The experimental results show our method can simultaneously recognize speech content for different Tibetan dialects and identify the dialect with high accuracy using a unified model. The dialect information used in output for training can improve multi-dialect speech recognition accuracy, and the low-resource dialects got higher speech content recognition rate and dialect classification accuracy by multi-dialect and multi-task recognition model than task-specific models.
引用
收藏
页码:1223 / 1235
页数:13
相关论文
共 50 条
  • [21] Multi-task Learning with Auxiliary Cross-attention Transformer for Low-Resource Multi-dialect Speech Recognition
    Dan, Zhengjia
    Zhao, Yue
    Bi, Xiaojun
    Wu, Licheng
    Ji, Qiang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 107 - 118
  • [22] Multi-Dialect Arabic POS Tagging: A CRF Approach
    Darwish, Kareem
    Mubarak, Hamdy
    Eldesouki, Mohamed
    Abdelali, Ahmed
    Samih, Younes
    Alharbi, Randah
    Attia, Mohammed
    Magdy, Walid
    Kallmeyer, Laura
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 93 - 98
  • [24] Automatic Estimation of Dialect Mixing Ratio for Dialect Speech Recognition
    Hirayama, Naoki
    Yoshino, Koichiro
    Itoyama, Katsutoshi
    Mori, Shinsuke
    Okuno, Hiroshi G.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1491 - 1495
  • [25] MD3: The Multi-Dialect Dataset of Dialogues
    Eisenstein, Jacob
    Prabhakaran, Vinodkumar
    Rivera, Clara
    Demszky, Dorottya
    Sharma, Devyani
    INTERSPEECH 2023, 2023, : 4059 - 4063
  • [26] A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic
    Cotterell, Ryan
    Callison-Burch, Chris
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [27] AUTOMATED MULTI-DIALECT SPEECH RECOGNITION USING STACKED ATTENTION-BASED DEEP LEARNING WITH NATURAL LANGUAGE PROCESSING MODEL
    AL Mazroa, Alanoud
    Miled, Achraf ben
    Asiri, Mashael m
    Alzahrani, Yazeed
    Sayed, Ahmed
    Nafie, Faisal mohammed
    FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2024, 32 (09N10)
  • [28] TeleSpeechPT: Large-Scale Chinese Multi-dialect and Multi-accent Speech Pre-training
    Chen, Hongjie
    Li, Zehan
    Xia, Guangmin
    Liu, Boqing
    Yang, Yan
    Kang, Jian
    Li, Jie
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 183 - 190
  • [29] Weighted Transformer for Dialect Speech Recognition
    Zhang, Minghan
    Xie, Fei
    Weng, Fuliang
    2022 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG), 2022, : 381 - 385
  • [30] Speech Processing for Hindi Dialect Recognition
    Sinha, Shweta
    Jain, Aruna
    Agrawal, Shyam S.
    ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 161 - 169