Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition

被引:20
|
作者
Oh, Yoo Rhee [1 ]
Yoon, Jae Sam [1 ]
Kim, Hong Kook [1 ]
机构
[1] Gwangju Inst Sci & Technol, Dept Informat & Commun, Kwangju 500712, South Korea
关键词
speech recognition; non-native speech; knowledge-based pronunciation variability; data-driven pronunciation variability; state-tying; state-clustering; decision tree; acoustic model adaptation;
D O I
10.1016/j.specom.2006.10.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, pronunciation variability between native and non-native speakers is investigated, and a novel acoustic model adaptation method is proposed based on pronunciation variability analysis in order to improve the performance of a speech recognition system by non-native speakers. The proposed acoustic model adaptation method is performed in two steps: analysis of the pronunciation variability of non-native speech, and acoustic model adaptation based on the pronunciation variability analysis. In order to obtain informative variant phonetic units, we analyze the pronunciation variability of non-native speech in two ways: a knowledge-based approach, and a data-driven approach. Next, for each approach, the acoustic model corresponding to each informative variant phonetic unit is adapted such that the state-tying of the acoustic model for non-native speech reflects a phonetic variability. For further improvement, a conventional acoustic model adaptation method such as MLLR and/or MAP is combined with the proposed acoustic model adaptation method. It is shown from the continuous Korean-English speech recognition experiments that the proposed method achieves an average word error rate reduction of 16.76% and 12.80% for the knowledge-based approach and the data-driven approach, respectively, when compared with the baseline speech recognition system trained by native speech. Moreover, a reduction of 53.45% and 57.14% in the average word error rate is obtained by combining MLLR and MAP adaptations to the adapted acoustic models by the proposed method for the knowledge-based approach and the data-driven approach, respectively. (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:59 / 70
页数:12
相关论文
共 50 条
  • [41] Evaluating Intra- and Crosslingual Adaptation for Non-native Speech Recognition in a Bilingual Environment
    Szaszak, Gyoergy
    Garner, Philip N.
    2013 IEEE 4TH INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2013, : 357 - 361
  • [42] Non-native English speech recognition using bilingual English lexicon and acoustic models
    Matsunaga, S
    Ogawa, A
    Yamaguchi, Y
    Imamura, A
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 625 - 628
  • [43] Non-native English speech recognition using bilingual english lexicon and acoustic models
    Matsunaga, S
    Ogawa, A
    Yamaguchi, Y
    Imamura, A
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 340 - 343
  • [44] Acoustic Model Adaptation for Speech Recognition
    Shinoda, Koichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2348 - 2362
  • [45] Gradeschoolers' linguistic and pragmatic speech adaptation to native and non-native interlocution
    Ravid, D
    Olshtain, E
    Ze'elon, R
    JOURNAL OF PRAGMATICS, 2003, 35 (01) : 71 - 99
  • [46] The Effects of Acoustic and Semantic Enhancements on Perception of Native and Non-Native Speech
    Kato, Misaki
    Baese-Berk, Melissa M.
    LANGUAGE AND SPEECH, 2024, 67 (01) : 40 - 71
  • [47] Re-Examining Phonetic Variability in Native and Non-Native Speech
    Vaughn, Charlotte
    Baese-Berk, Melissa
    Idemaru, Kaori
    PHONETICA, 2019, 76 (05) : 327 - 358
  • [48] Accent neutralization for speech recognition of non-native speakers
    Radzikowski, Kacper
    Forc, Mateusz
    Wang, Le
    Yoshie, Osamu
    Nowak, Robert
    IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 136 - 141
  • [49] Investigating automatic recognition of non-native arabic speech
    Selouani, Sid-Ahmed
    Alotaibi, Yousef Ajami
    2007 INNOVATIONS IN INFORMATION TECHNOLOGIES, VOLS 1 AND 2, 2007, : 204 - +
  • [50] Dual supervised learning for non-native speech recognition
    Radzikowski, Kacper
    Nowak, Robert
    Wang, Le
    Yoshie, Osamu
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (1)