MULTI-DIALECT SPEECH RECOGNITION IN ENGLISH USING ATTENTION ON ENSEMBLE OF EXPERTS

被引:13
|
作者
Das, Amit [1 ]
Kumar, Kshitiz [1 ]
Wu, Jian [1 ]
机构
[1] Microsoft Speech & Language Grp, Redmond, WA 98052 USA
关键词
multi-dialect; attention; mixture of experts; acoustic modeling; speech recognition; DEEP NEURAL-NETWORK;
D O I
10.1109/ICASSP39728.2021.9413952
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the presence of a wide variety of dialects, training dialect-specific models for each dialect is a demanding task. Previous studies have explored training a single model that is robust across multiple dialects. These studies have used either multi-condition training, multi-task learning, end-to-end modeling, or ensemble modeling. In this study, we further explore using a single model for multi-dialect speech recognition using ensemble modeling. First, we build an ensemble of dialect-specific models (or experts). Then we linearly combine the outputs of the experts using attention weights generated by a long short-term memory (LSTM) network. For comparison purposes, we train a model that jointly learns to recognize and classify dialects using multi-task learning and a second model using multi-condition training. We train all of these models with about 60,000 hours of speech data collected in American English, Canadian English, British English, and Australian English. Experimental results reveal that our best proposed model achieved an average 4.74% word error rate reduction (WERR) compared to the strong baseline model.
引用
收藏
页码:6244 / 6248
页数:5
相关论文
共 50 条
  • [1] Multi-Dialect Arabic Speech Recognition
    Ali, Abbas Raza
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [2] Tibetan Multi-Dialect Speech and Dialect Identity Recognition
    Zhao, Yue
    Yue, Jianjian
    Song, Wei
    Xu, Xiaona
    Li, Xiali
    Wu, Licheng
    Ji, Qiang
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 60 (03): : 1223 - 1235
  • [3] An open speech resource for Tibetan multi-dialect and multitask recognition
    Zhao, Yue
    Xu, Xiaona
    Yue, Jianjian
    Song, Wei
    Li, Xiali
    Wu, Licheng
    Ji, Qiang
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 22 (2-3) : 297 - 304
  • [4] Chinese Multi-Dialect Speech Recognition Based on Instruction Tuning
    Ding, Timin
    Sun, Kai
    Zhang, Xu
    Yu, Jian
    Huang, Degen
    FOURTH SYMPOSIUM ON PATTERN RECOGNITION AND APPLICATIONS, SPRA 2023, 2024, 13162
  • [5] Global RNN Transducer Models For Multi-dialect Speech Recognition
    Fukuda, Takashi
    Thomas, Samuel
    Suzuki, Masayuki
    Kurata, Gakuto
    Saon, George
    Kingsbury, Brian
    INTERSPEECH 2022, 2022, : 3138 - 3142
  • [6] A HIGHLY ADAPTIVE ACOUSTIC MODEL FOR ACCURATE MULTI-DIALECT SPEECH RECOGNITION
    Yoo, Sanghyun
    Song, Inchul
    Bengio, Yoshua
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5716 - 5720
  • [7] Breaking the Corpus Bottleneck for Multi-dialect Speech Recognition with Flexible Adapters
    Deng, Tengyue
    Wei, Jianguo
    Yang, Jiahao
    Guo, Minghao
    Ke, Wenjun
    Yang, Xiaokang
    Lu, Wenhuan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 3 - 15
  • [8] MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL
    Li, Bo
    Sainath, Tara N.
    Sim, Khe Chai
    Bacchiani, Michiel
    Weinstein, Eugene
    Nguyen, Patrick
    Chen, Zhifeng
    Wu, Yonghui
    Rao, Kanishka
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4749 - 4753
  • [9] AUTOMATED MULTI-DIALECT SPEECH RECOGNITION USING STACKED ATTENTION-BASED DEEP LEARNING WITH NATURAL LANGUAGE PROCESSING MODEL
    AL Mazroa, Alanoud
    Miled, Achraf ben
    Asiri, Mashael m
    Alzahrani, Yazeed
    Sayed, Ahmed
    Nafie, Faisal mohammed
    FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2024, 32 (09N10)
  • [10] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)