CROSS-LINGUAL SPEECH RECOGNITION UNDER RUNTIME RESOURCE CONSTRAINTS

被引:9
|
作者
Yu, Dong [1 ]
Deng, Li [1 ]
Liu, Peng [1 ]
Wu, Jian [1 ]
Gong, Yifan [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
Cross-lingual speech recognition; Kullback-Leibler divergence; lexicon conversion; senone mapping; resource constraint;
D O I
10.1109/ICASSP.2009.4960553
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes and compares four cross-lingual and bilingual automatic speech recognition techniques under the constraint that only the acoustic model (AM) of the native language is used at runtime. The first three techniques fall into the category of lexicon conversion where each phoneme sequence (PHS) in the foreign language (FL) lexicon is mapped into the native language (NL) phoneme sequence. The first technique determines the PHS mapping through the international phonetic alphabet (IPA) features; The second and third techniques are data-driven. They determine the mapping by converting the PHS into corresponding context-independent and context-dependent hidden Markov models (HMMs) respectively and searching for the NL PHS with the least Kullback-Leibler divergence (KLD) between the HMMs. The fourth technique falls into the category of AM merging where the FL's AM is merged into the NL's AM by mapping each senone in the FL's AM to the senone in the NL's AM with the minimum KLD. We discuss the strengths and limitations of each technique developed, report empirical evaluation results on recognizing English utterances with a Korean recognizer, and demonstrate the high correlation between the average KLD and the word error rate (WER). The results show that the AM merging technique performs the best, achieving 60% relative WER reduction over the IPA-based technique.
引用
收藏
页码:4193 / 4196
页数:4
相关论文
共 50 条
  • [31] Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition
    Latif, Siddique
    Qadir, Junaid
    Bilal, Muhammad
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [32] Cost-efficient cross-lingual adaptation of a speech recognition system
    Callejas, Zoraida
    Nouza, Jan
    Cerva, Petr
    López-Cózar, Ramón
    Advances in Intelligent and Soft Computing, 2009, 57 : 331 - 338
  • [33] CROSS-LINGUAL PHONEME MAPPING FOR LANGUAGE ROBUST CONTEXTUAL SPEECH RECOGNITION
    Patel, Ami
    Li, David
    Cho, Eunjoon
    Aleksic, Petar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5924 - 5928
  • [34] Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
    Hernandez, Abner
    Perez-Toro, Paula Andrea
    Noeth, Elmar
    Orozco-Arroyave, Juan Rafael
    Maier, Andreas
    Yang, Seung Hee
    INTERSPEECH 2022, 2022, : 51 - 55
  • [35] CROSS-LINGUAL CONTEXT SHARING AND PARAMETER-TYING FOR MULTI-LINGUAL SPEECH RECOGNITION
    Mohan, Aanchan
    Rose, Richard
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 126 - 131
  • [36] Cross-lingual Dialog Model for Speech to Speech Translation
    Ettelaie, Emil
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1173 - 1176
  • [37] Cross-lingual Named Entity Recognition
    Steinberger, Ralf
    Pouliquen, Bruno
    LINGUISTICAE INVESTIGATIONES, 2007, 30 (01): : 135 - 162
  • [38] Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR
    Klejch, Ondrej
    Wallington, Electra
    Bell, Peter
    INTERSPEECH 2022, 2022, : 2288 - 2292
  • [39] Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions
    Guo, Taiyang
    Li, Sixia
    Kidani, Shunsuke
    Okada, Shogo
    Unoki, Masashi
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2221 - 2227
  • [40] A many-to-one phone mapping approach for cross-lingual speech recognition
    Do, Van Hai
    Chen, Nancy F.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    2016 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES, RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2016, : 120 - 124