Tibetan Multi-Dialect Speech and Dialect Identity Recognition

被引:7
|
作者
Zhao, Yue [1 ]
Yue, Jianjian [1 ]
Song, Wei [1 ]
Xu, Xiaona [1 ]
Li, Xiali [1 ]
Wu, Licheng [1 ]
Ji, Qiang [2 ]
机构
[1] Minzu Univ China, Sch Informat & Engn, Beijing 100081, Peoples R China
[2] Rensselaer Polytech Inst, JEC 7004, Troy, NY 12180 USA
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2019年 / 60卷 / 03期
关键词
Tibetan multi-dialect speech recognition; dialect identification; multi-task learning; wavenet model;
D O I
10.32604/cmc.2019.05636
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tibetan language has very limited resource for conventional automatic speech recognition so far. It lacks of enough data, sub-word unit, lexicons and word inventories for some dialects. And speech content recognition and dialect classification have been treated as two independent tasks and modeled respectively in most prior works. But the two tasks are highly correlated. In this paper, we present a multi-task WaveNet model to perform simultaneous Tibetan multi-dialect speech recognition and dialect identification. It avoids processing the pronunciation dictionary and word segmentation for new dialects, while, in the meantime, allows training speech recognition and dialect identification in a single model. The experimental results show our method can simultaneously recognize speech content for different Tibetan dialects and identify the dialect with high accuracy using a unified model. The dialect information used in output for training can improve multi-dialect speech recognition accuracy, and the low-resource dialects got higher speech content recognition rate and dialect classification accuracy by multi-dialect and multi-task recognition model than task-specific models.
引用
收藏
页码:1223 / 1235
页数:13
相关论文
共 50 条
  • [41] Chinese dialect speech recognition: a comprehensive survey
    Li, Qiang
    Mai, Qianyu
    Wang, Mandou
    Ma, Mingjuan
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (02)
  • [42] Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling
    Zalmout, Nasser
    Habash, Nizar
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1775 - 1786
  • [43] Effective Fine-tuning Method for Tibetan Low-resource Dialect Speech Recognition
    Yang, Jiahao
    Wei, Jianguo
    Khysru, Kuntharrgyal
    Xu, Junhai
    Lu, Wenhuan
    Ke, Wenjun
    Yang, Xiaokang
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 960 - 965
  • [44] A Streaming End-to-End Speech Recognition Approach Based on WeNet for Tibetan Amdo Dialect
    Wang, Chao
    Wen, Yao
    Lhamo, Phurba
    Tashi, Nyima
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 317 - 322
  • [45] End-to-end Speech Synthesis for Tibetan Lhasa Dialect
    Luo, Lisai
    Li, Guanyu
    Gong, Chunwei
    Ding, Hailan
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [46] Arabic Sentiment Analysis for Multi-dialect Text using Machine Learning Techniques
    Hussein, Aya H.
    Moawad, Ibrahim F.
    Badry, Rasha M.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (12) : 693 - 700
  • [47] Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialect Arabic Datasets
    Keleg, Amr
    Magdy, Walid
    Goldwater, Sharon
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 778 - 789
  • [48] Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialect Arabic Datasets
    Keleg, Amr
    Magdy, Walid
    Goldwater, Sharon
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 766 - 777
  • [49] Dialect Conflict and Identity Issues in Tlemcen Speech Community
    Slimane-Mahdad, Hynd Kaid
    ARAB WORLD ENGLISH JOURNAL, 2019, 10 (04) : 121 - 132
  • [50] Multi Dialect Arabic Speech Parallel Corpora
    Almeman, Khalid
    Lee, Mark
    Almiman, Ali Abdulrahman
    2013 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS SIGNAL PROCESSING, AND THEIR APPLICATIONS (ICCSPA'13), 2013,