CROSS-LINGUAL SPEECH RECOGNITION UNDER RUNTIME RESOURCE CONSTRAINTS

被引:9
|
作者
Yu, Dong [1 ]
Deng, Li [1 ]
Liu, Peng [1 ]
Wu, Jian [1 ]
Gong, Yifan [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
Cross-lingual speech recognition; Kullback-Leibler divergence; lexicon conversion; senone mapping; resource constraint;
D O I
10.1109/ICASSP.2009.4960553
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes and compares four cross-lingual and bilingual automatic speech recognition techniques under the constraint that only the acoustic model (AM) of the native language is used at runtime. The first three techniques fall into the category of lexicon conversion where each phoneme sequence (PHS) in the foreign language (FL) lexicon is mapped into the native language (NL) phoneme sequence. The first technique determines the PHS mapping through the international phonetic alphabet (IPA) features; The second and third techniques are data-driven. They determine the mapping by converting the PHS into corresponding context-independent and context-dependent hidden Markov models (HMMs) respectively and searching for the NL PHS with the least Kullback-Leibler divergence (KLD) between the HMMs. The fourth technique falls into the category of AM merging where the FL's AM is merged into the NL's AM by mapping each senone in the FL's AM to the senone in the NL's AM with the minimum KLD. We discuss the strengths and limitations of each technique developed, report empirical evaluation results on recognizing English utterances with a Korean recognizer, and demonstrate the high correlation between the average KLD and the word error rate (WER). The results show that the AM merging technique performs the best, achieving 60% relative WER reduction over the IPA-based technique.
引用
收藏
页码:4193 / 4196
页数:4
相关论文
共 50 条
  • [21] Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition
    Cahyawijaya, Samuel
    Lovenia, Holy
    Chung, Willy
    Frieske, Rita
    Liu, Zihan
    Fung, Pascale
    INTERSPEECH 2023, 2023, : 3352 - 3356
  • [22] Zero-Resource Cross-Lingual Named Entity Recognition
    Bari, M. Saiful
    Joty, Shafiq
    Jwalapuram, Prathyusha
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7415 - 7423
  • [23] Improving cross-lingual low-resource speech recognition by Task-based Meta PolyLoss
    Chen, Yaqi
    Zhang, Hao
    Yang, Xukui
    Zhang, Wenlin
    Qu, Dan
    COMPUTER SPEECH AND LANGUAGE, 2024, 87
  • [24] CROSS-LINGUAL TRANSFER LEARNING FOR LOW-RESOURCE SPEECH TRANSLATION
    Khurana, Sameer
    Dawalatabad, Nauman
    Laurent, Antoine
    Vicente, Luis
    Gimeno, Pablo
    Mingote, Victoria
    Glass, James
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 670 - 674
  • [25] Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition
    Zi-Qiang Zhang
    Yan Song
    Ming-Hui Wu
    Xin Fang
    Ian McLoughlin
    Li-Rong Dai
    Circuits, Systems, and Signal Processing, 2022, 41 : 6827 - 6843
  • [26] Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition
    Zhang, Zi-Qiang
    Song, Yan
    Wu, Ming-Hui
    Fang, Xin
    McLoughlin, Ian
    Dai, Li-Rong
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (12) : 6827 - 6843
  • [27] A Comparative Study of BNF and DNN Multilingual Training on Cross-lingual Low-resource Speech Recognition
    Xu, Haihua
    Van Hai Do
    Xiao, Xiong
    Chng, Eng-Siong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2132 - 2136
  • [28] Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages
    Van Hai Do
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (02): : 285 - 295
  • [29] UNSUPERVISED CROSS-LINGUAL SPEECH EMOTION RECOGNITION USING PSEUDO MULTILABEL
    Li, Fin
    Yan, Nan
    Wang, Lan
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 366 - 373
  • [30] Multilingual, Cross-lingual, and Monolingual Speech Emotion Recognition on EmoFilm Dataset
    Atmaja, Bagus Tris
    Sasou, Akira
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1019 - 1025