CROSS-LINGUAL SPEECH RECOGNITION UNDER RUNTIME RESOURCE CONSTRAINTS

被引：9

作者：

Yu, Dong ^{[1
]}

Deng, Li ^{[1
]}

Liu, Peng ^{[1
]}

Wu, Jian ^{[1
]}

Gong, Yifan ^{[1
]}

Acero, Alex ^{[1
]}

机构：

[1] Microsoft Corp, Redmond, WA 98052 USA

来源：

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年

关键词：

Cross-lingual speech recognition; Kullback-Leibler divergence; lexicon conversion; senone mapping; resource constraint;

D O I：

10.1109/ICASSP.2009.4960553

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes and compares four cross-lingual and bilingual automatic speech recognition techniques under the constraint that only the acoustic model (AM) of the native language is used at runtime. The first three techniques fall into the category of lexicon conversion where each phoneme sequence (PHS) in the foreign language (FL) lexicon is mapped into the native language (NL) phoneme sequence. The first technique determines the PHS mapping through the international phonetic alphabet (IPA) features; The second and third techniques are data-driven. They determine the mapping by converting the PHS into corresponding context-independent and context-dependent hidden Markov models (HMMs) respectively and searching for the NL PHS with the least Kullback-Leibler divergence (KLD) between the HMMs. The fourth technique falls into the category of AM merging where the FL's AM is merged into the NL's AM by mapping each senone in the FL's AM to the senone in the NL's AM with the minimum KLD. We discuss the strengths and limitations of each technique developed, report empirical evaluation results on recognizing English utterances with a Korean recognizer, and demonstrate the high correlation between the average KLD and the word error rate (WER). The results show that the AM merging technique performs the best, achieving 60% relative WER reduction over the IPA-based technique.

引用

页码：4193 / 4196

页数：4

共 50 条

[21] Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition
Cahyawijaya, Samuel
Lovenia, Holy
Chung, Willy
Frieske, Rita
Liu, Zihan
Fung, Pascale
INTERSPEECH 2023, 2023, : 3352 - 3356
[22] Zero-Resource Cross-Lingual Named Entity Recognition
Bari, M. Saiful
Joty, Shafiq
Jwalapuram, Prathyusha
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7415 - 7423
[23] Improving cross-lingual low-resource speech recognition by Task-based Meta PolyLoss
Chen, Yaqi
Zhang, Hao
Yang, Xukui
Zhang, Wenlin
Qu, Dan
COMPUTER SPEECH AND LANGUAGE, 2024, 87
[24] CROSS-LINGUAL TRANSFER LEARNING FOR LOW-RESOURCE SPEECH TRANSLATION
Khurana, Sameer
Dawalatabad, Nauman
Laurent, Antoine
Vicente, Luis
Gimeno, Pablo
Mingote, Victoria
Glass, James
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 670 - 674
[25] Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition
Zi-Qiang Zhang
Yan Song
Ming-Hui Wu
Xin Fang
Ian McLoughlin
Li-Rong Dai
Circuits, Systems, and Signal Processing, 2022, 41 : 6827 - 6843
[26] Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition
Zhang, Zi-Qiang
Song, Yan
Wu, Ming-Hui
Fang, Xin
McLoughlin, Ian
Dai, Li-Rong
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (12) : 6827 - 6843
[27] A Comparative Study of BNF and DNN Multilingual Training on Cross-lingual Low-resource Speech Recognition
Xu, Haihua
Van Hai Do
Xiao, Xiong
Chng, Eng-Siong
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2132 - 2136
[28] Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages
Van Hai Do
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (02): : 285 - 295
[29] UNSUPERVISED CROSS-LINGUAL SPEECH EMOTION RECOGNITION USING PSEUDO MULTILABEL
Li, Fin
Yan, Nan
Wang, Lan
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 366 - 373
[30] Multilingual, Cross-lingual, and Monolingual Speech Emotion Recognition on EmoFilm Dataset
Atmaja, Bagus Tris
Sasou, Akira
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1019 - 1025

← 1 2 3 4 5 →