Global RNN Transducer Models For Multi-dialect Speech Recognition

被引:0
|
作者
Fukuda, Takashi [1 ]
Thomas, Samuel [2 ]
Suzuki, Masayuki [1 ]
Kurata, Gakuto [1 ]
Saon, George [2 ]
Kingsbury, Brian [2 ]
机构
[1] IBM Res AI, Chuo Ku, Hakozaki Cho, Tokyo, Japan
[2] IBM TJ Watson Res Ctr, IBM Res AI, Yorktown Hts, NY USA
来源
关键词
End-to-end ASR; recurrent neural network transducer; multi-dialect; computationally inexpensive;
D O I
10.21437/Interspeech.2022-165
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Constructing single, unified automatic speech recognition (ASR) models that work effectively across various dialects of a language is a challenging problem. Although many recently proposed approaches are effective, they are computationally more expensive compared to the conventional approach of using ASR models designed separately for each dialect. In this paper, we propose a novel modeling technique for constructing accurate, multi-dialect, speech recognition systems with a single unified model, based on recurrent neural network transducers (RNN-T), which does not incur any extra computational costs at decoding time. Once a model has been created, the same decoding settings can also be used across all dialects. In our proposed approach, an RNN-T model with a shared encoder, common joint network and multi-branch prediction networks is first constructed. After training each prediction network on an ASR task corresponding to various dialects, an effective interpolation step combines the multi-branch prediction networks back into a computationally-efficient single branch. The effectiveness of the proposed technique is shown on ASR tasks on major English dialects. The proposed method approaches oracle performance and improves by 15-30% relative over dialect-specific models in dialect agnostic conditions.
引用
收藏
页码:3138 / 3142
页数:5
相关论文
共 50 条
  • [1] Multi-Dialect Arabic Speech Recognition
    Ali, Abbas Raza
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [2] Tibetan Multi-Dialect Speech and Dialect Identity Recognition
    Zhao, Yue
    Yue, Jianjian
    Song, Wei
    Xu, Xiaona
    Li, Xiali
    Wu, Licheng
    Ji, Qiang
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 60 (03): : 1223 - 1235
  • [3] An open speech resource for Tibetan multi-dialect and multitask recognition
    Zhao, Yue
    Xu, Xiaona
    Yue, Jianjian
    Song, Wei
    Li, Xiali
    Wu, Licheng
    Ji, Qiang
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 22 (2-3) : 297 - 304
  • [4] Chinese Multi-Dialect Speech Recognition Based on Instruction Tuning
    Ding, Timin
    Sun, Kai
    Zhang, Xu
    Yu, Jian
    Huang, Degen
    FOURTH SYMPOSIUM ON PATTERN RECOGNITION AND APPLICATIONS, SPRA 2023, 2024, 13162
  • [5] MULTI-DIALECT SPEECH RECOGNITION IN ENGLISH USING ATTENTION ON ENSEMBLE OF EXPERTS
    Das, Amit
    Kumar, Kshitiz
    Wu, Jian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6244 - 6248
  • [6] A HIGHLY ADAPTIVE ACOUSTIC MODEL FOR ACCURATE MULTI-DIALECT SPEECH RECOGNITION
    Yoo, Sanghyun
    Song, Inchul
    Bengio, Yoshua
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5716 - 5720
  • [7] Breaking the Corpus Bottleneck for Multi-dialect Speech Recognition with Flexible Adapters
    Deng, Tengyue
    Wei, Jianguo
    Yang, Jiahao
    Guo, Minghao
    Ke, Wenjun
    Yang, Xiaokang
    Lu, Wenhuan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 3 - 15
  • [8] MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL
    Li, Bo
    Sainath, Tara N.
    Sim, Khe Chai
    Bacchiani, Michiel
    Weinstein, Eugene
    Nguyen, Patrick
    Chen, Zhifeng
    Wu, Yonghui
    Rao, Kanishka
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4749 - 4753
  • [9] Graphical Models for Multi-Dialect Arabic Isolated Words Recognition
    Zarrouk, Elyes
    BenAyed, Yassine
    Gargouri, Faiez
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 19TH ANNUAL CONFERENCE, KES-2015, 2015, 60 : 508 - 516
  • [10] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)