A CONFIGURABLE MULTILINGUAL MODEL IS ALL YOU NEED TO RECOGNIZE ALL LANGUAGES

被引:4
|
作者
Zhou, Long [1 ]
Li, Jinyu [2 ]
Sun, Eric [2 ]
Liu, Shujie [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
[2] Microsoft Speech & Language Grp, Beijing, Peoples R China
关键词
multilingual speech recognition; configurable multilingual model; transformer-transducer;
D O I
10.1109/ICASSP43922.2022.9747905
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multilingual automatic speech recognition models have shown great promise in recent years because of the simple model training and deployment process. Conventional methods either train a universal multilingual model without taking any language information or with a 1-hot language ID (LID) vector to guide the recognition of the target language. In practice, a multilingual user can be prompted to pre-select several languages he/she can speak. The multilingual model without LID cannot well utilize the language information set by the user while the multilingual model with 1-hot LID can only handle one pre-selected language. In this paper, we propose a novel configurable multilingual model (CMM) which is trained only once but can be configured as different models based on users' choices by extracting language-specific modules together with a universal module from the trained CMM. Particularly, a single CMM can be deployed to any user scenario where the users can pre-select any combination of languages. Trained with 75K hours of transcribed anonymized Microsoft multilingual data and evaluated with 10-language test sets, the proposed CMM improves from the universal multilingual model by 26.0%, 16.9%, and 10.4% relative word error reduction when the user selects 1, 2, or 3 languages, respectively.
引用
收藏
页码:6422 / 6426
页数:5
相关论文
共 50 条