Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition

被引:20
|
作者
Oh, Yoo Rhee [1 ]
Yoon, Jae Sam [1 ]
Kim, Hong Kook [1 ]
机构
[1] Gwangju Inst Sci & Technol, Dept Informat & Commun, Kwangju 500712, South Korea
关键词
speech recognition; non-native speech; knowledge-based pronunciation variability; data-driven pronunciation variability; state-tying; state-clustering; decision tree; acoustic model adaptation;
D O I
10.1016/j.specom.2006.10.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, pronunciation variability between native and non-native speakers is investigated, and a novel acoustic model adaptation method is proposed based on pronunciation variability analysis in order to improve the performance of a speech recognition system by non-native speakers. The proposed acoustic model adaptation method is performed in two steps: analysis of the pronunciation variability of non-native speech, and acoustic model adaptation based on the pronunciation variability analysis. In order to obtain informative variant phonetic units, we analyze the pronunciation variability of non-native speech in two ways: a knowledge-based approach, and a data-driven approach. Next, for each approach, the acoustic model corresponding to each informative variant phonetic unit is adapted such that the state-tying of the acoustic model for non-native speech reflects a phonetic variability. For further improvement, a conventional acoustic model adaptation method such as MLLR and/or MAP is combined with the proposed acoustic model adaptation method. It is shown from the continuous Korean-English speech recognition experiments that the proposed method achieves an average word error rate reduction of 16.76% and 12.80% for the knowledge-based approach and the data-driven approach, respectively, when compared with the baseline speech recognition system trained by native speech. Moreover, a reduction of 53.45% and 57.14% in the average word error rate is obtained by combining MLLR and MAP adaptations to the adapted acoustic models by the proposed method for the knowledge-based approach and the data-driven approach, respectively. (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:59 / 70
页数:12
相关论文
共 50 条
  • [21] Perceptual adaptation to non-native speech
    Bradlow, Ann R.
    Bent, Tessa
    COGNITION, 2008, 106 (02) : 707 - 729
  • [22] Robust adaptation to non-native accents in automatic speech recognition - Introduction
    Goronzy, S
    ROBUST ADAPTATION TO NON-NATIVE ACCENTS IN AUTOMATIC SPEECH RECOGNITION, 2002, 2560 : 1 - +
  • [23] Deep Neural Network Acoustic Modeling for Native and Non-native Mandarin Speech Recognition
    Chen, Xin
    Cheng, Jian
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 6 - 9
  • [24] Perceptual Consequences of Variability in Native and Non-Native Speech
    Baese-Berk, Melissa M.
    Morrill, Tuuli H.
    PHONETICA, 2019, 76 (2-3) : 126 - 141
  • [25] Disentangling the Contribution of Non-native Speech in Automated Pronunciation Assessment
    Shi, Shuju
    Fu, Kaiqi
    Gu, Yiwei
    Tian, Xiaohai
    Gao, Shaojun
    Li, Wei
    Ma, Zejun
    INTERSPEECH 2023, 2023, : 954 - 958
  • [26] On Recognition of Non-Native Speech Using Probabilistic Lexical Model
    Razavi, Marzieh
    Doss, Mathew Magimai
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 26 - 30
  • [27] Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: cumulating speech resources in a pluricentric language
    Wei, X.
    Cucchiarini, C.
    van Hout, R.
    Strik, H.
    SPEECH COMMUNICATION, 2022, 144 : 1 - 9
  • [28] Phoneme Level Non-Native Pronunciation Analysis by an Auditory Model-based Native Assessment Scheme
    Koniaris, Christos
    Engwall, Olov
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1164 - 1167
  • [29] Variability in the pronunciation of non-native English the: Effects of frequency and disfluencies
    Schertz, Jessamyn
    Ernestus, Mirjam
    CORPUS LINGUISTICS AND LINGUISTIC THEORY, 2014, 10 (02) : 329 - 345
  • [30] ON THE USE OF FEATURE-SPACE MLLR ADAPTATION FOR NON-NATIVE SPEECH RECOGNITION
    Oh, Yoo Rhee
    Kim, Hong Kook
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4314 - 4317