Tibetan Multi-Dialect Speech and Dialect Identity Recognition

被引:7
|
作者
Zhao, Yue [1 ]
Yue, Jianjian [1 ]
Song, Wei [1 ]
Xu, Xiaona [1 ]
Li, Xiali [1 ]
Wu, Licheng [1 ]
Ji, Qiang [2 ]
机构
[1] Minzu Univ China, Sch Informat & Engn, Beijing 100081, Peoples R China
[2] Rensselaer Polytech Inst, JEC 7004, Troy, NY 12180 USA
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2019年 / 60卷 / 03期
关键词
Tibetan multi-dialect speech recognition; dialect identification; multi-task learning; wavenet model;
D O I
10.32604/cmc.2019.05636
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tibetan language has very limited resource for conventional automatic speech recognition so far. It lacks of enough data, sub-word unit, lexicons and word inventories for some dialects. And speech content recognition and dialect classification have been treated as two independent tasks and modeled respectively in most prior works. But the two tasks are highly correlated. In this paper, we present a multi-task WaveNet model to perform simultaneous Tibetan multi-dialect speech recognition and dialect identification. It avoids processing the pronunciation dictionary and word segmentation for new dialects, while, in the meantime, allows training speech recognition and dialect identification in a single model. The experimental results show our method can simultaneously recognize speech content for different Tibetan dialects and identify the dialect with high accuracy using a unified model. The dialect information used in output for training can improve multi-dialect speech recognition accuracy, and the low-resource dialects got higher speech content recognition rate and dialect classification accuracy by multi-dialect and multi-task recognition model than task-specific models.
引用
收藏
页码:1223 / 1235
页数:13
相关论文
共 50 条
  • [31] Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models
    Hirayama, Naoki
    Yoshino, Koichiro
    Itoyama, Katsutoshi
    Mori, Shinsuke
    Okuno, Hiroshi G.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (02) : 373 - 382
  • [32] TEACHING ENGLISH TO JAMAICAN CREOLE SPEAKERS - MODEL OF A MULTI-DIALECT SITUATION
    CRAIG, DR
    LANGUAGE LEARNING, 1966, 16 (1-2) : 49 - 49
  • [33] Speech-in-speech recognition is modulated by familiarity to dialect
    Chin, Jessica L. L.
    Talevska, Elena
    Antoniou, Mark
    INTERSPEECH 2023, 2023, : 3113 - 3116
  • [34] CONSTRUCTION AND ANALYSIS OF TIBETAN AMDO DIALECT SPEECH DATASET FOR SPEECH SYNTHESIS
    Zhang, Xinyi
    Lu, Wenhuan
    Zhao, Xinyue
    Zhu, Yi
    Wei, Jianguo
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 49 - 52
  • [35] Chinese dialect speech recognition: a comprehensive survey
    Qiang Li
    Qianyu Mai
    Mandou Wang
    Mingjuan Ma
    Artificial Intelligence Review, 57
  • [36] Automatic speech recognition system for Tunisian dialect
    Abir Masmoudi
    Fethi Bougares
    Mariem Ellouze
    Yannick Estève
    Lamia Belguith
    Language Resources and Evaluation, 2018, 52 : 249 - 267
  • [37] Speech Emotion Recognition Based on Henan Dialect
    Cheng, Zichen
    Li, Yan
    Jiu, Mengfei
    Ge, Jiangwei
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 498 - 505
  • [38] An Annotated Speech Corpus of Rare Dialect for Recognition-Take Dali Dialect as an Example
    Huang, Tian
    Yang, Dongqi
    Qin, Wanyun
    Zhang, Shubo
    Li, Binyang
    Li, Yan
    COGNITIVE COMPUTING, ICCC 2021, 2022, 12992 : 3 - 13
  • [39] Automatic speech recognition system for Tunisian dialect
    Masmoudi, Abir
    Bougares, Fethi
    Ellouze, Mariem
    Esteve, Yannick
    Belguith, Lamia
    LANGUAGE RESOURCES AND EVALUATION, 2018, 52 (01) : 249 - 267
  • [40] Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 297 - 301