CROSS-LINGUAL SPEECH RECOGNITION UNDER RUNTIME RESOURCE CONSTRAINTS

被引:9
|
作者
Yu, Dong [1 ]
Deng, Li [1 ]
Liu, Peng [1 ]
Wu, Jian [1 ]
Gong, Yifan [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
Cross-lingual speech recognition; Kullback-Leibler divergence; lexicon conversion; senone mapping; resource constraint;
D O I
10.1109/ICASSP.2009.4960553
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes and compares four cross-lingual and bilingual automatic speech recognition techniques under the constraint that only the acoustic model (AM) of the native language is used at runtime. The first three techniques fall into the category of lexicon conversion where each phoneme sequence (PHS) in the foreign language (FL) lexicon is mapped into the native language (NL) phoneme sequence. The first technique determines the PHS mapping through the international phonetic alphabet (IPA) features; The second and third techniques are data-driven. They determine the mapping by converting the PHS into corresponding context-independent and context-dependent hidden Markov models (HMMs) respectively and searching for the NL PHS with the least Kullback-Leibler divergence (KLD) between the HMMs. The fourth technique falls into the category of AM merging where the FL's AM is merged into the NL's AM by mapping each senone in the FL's AM to the senone in the NL's AM with the minimum KLD. We discuss the strengths and limitations of each technique developed, report empirical evaluation results on recognizing English utterances with a Korean recognizer, and demonstrate the high correlation between the average KLD and the word error rate (WER). The results show that the AM merging technique performs the best, achieving 60% relative WER reduction over the IPA-based technique.
引用
收藏
页码:4193 / 4196
页数:4
相关论文
共 50 条
  • [41] Semantic speech recognition in the Basque context Part I: cross-lingual approaches
    Barroso, Nora
    Lopez de Ipina, Karmele
    Barroso, Odei
    Ezeiza, Aitzol
    Hernandez, Carmen
    Grana, Manuel
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (01) : 33 - 40
  • [42] CROSS-LINGUAL SPEECH RECOGNITION BETWEEN LANGUAGES FROM THE SAME LANGUAGE FAMILY
    Zgank, Andrej
    PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2019, 20 (02): : 184 - 191
  • [43] Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 4731 - 4735
  • [44] Cross-Lingual Speech-to-Text Summarization
    Pontes, Elvys Linhares
    Gonzalez-Gallardo, Carlos-Emiliano
    Torres-Moreno, Juan-Manuel
    Huet, Stephane
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 385 - 395
  • [45] Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering
    HajiAminShirazi, Shahrzad
    Momtazi, Saeedeh
    MACHINE TRANSLATION, 2020, 34 (04) : 287 - 303
  • [46] Cross-Lingual Summarization of Speech-to-Speech Translation: A Baseline
    Karande, Pranav
    Sarkar, Balaram
    Maurya, Chandresh Kumar
    SPEECH AND COMPUTER, SPECOM 2024, PT I, 2025, 15299 : 119 - 133
  • [47] Cross-lingual offensive speech identification with transfer learning for low-resource languages
    Shi, Xiayang
    Liu, Xinyi
    Xu, Chun
    Huang, Yuanyuan
    Chen, Fang
    Zhu, Shaolin
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [48] Advancements in Bangla Speech Emotion Recognition: A Deep Learning Approach with Cross-Lingual Validation
    Alam, Khorshed
    Bhuiyan, Mahbubul Haq
    Hossain, Md Junayed
    Monir, Md Fahad
    Bin Khaled, Md Asif
    2024 IEEE 99TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2024-SPRING, 2024,
  • [49] Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
    Chatzoudis, Gerasimos
    Plitsis, Manos
    Stamouli, Spyridoula
    Dimou, Athanasia-Lida
    Katsamanis, Nassos
    Katsouros, Vassilis
    INTERSPEECH 2022, 2022, : 2178 - 2182
  • [50] Leveraging Cross-Lingual Tweets in Location Recognition
    Alkouz, Balsam
    Al Aghbari, Zaher
    2018 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY (EIT), 2018, : 84 - 89