Implicit modelling of pronunciation variation in automatic speech recognition

被引:27
|
作者
Hain, T [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
automatic speech recognition; pronunciation modelling; acoustic modelling; hidden markov models; pronunciation dictionaries; single pronunciations; parameter tying; phonetic decision trees; state clustering; conversational speech recognition; Hidden Model Sequence Models;
D O I
10.1016/j.specom.2005.03.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modelling of pronunciation variability is an important task for the acoustic model of an automatic speech recognition system. Good pronunciation models contribute to the robustness and generic applicability of a speech recogniser. Usually pronunciation modelling is associated with a lexicon that allows to explicitly control the selection of appropriate HMMs for a particular word. However, the use of data-driven clustering techniques or specific parameter tying techniques has considerable impact on this form of model selection and the construction of a task-optimal dictionary. Most large vocabulary speech recognition systems make use of a dictionary with multiple possible pronunciation variants per word. By manual addition of pronunciation variants explicit human knowledge is used in the recognition process. For reasons of complexity the optimisation of manual entries for performance is often not feasible. In this paper a method for the stepwise reduction of the number of pronunciation variants per word to one is described. By doing so in a way consistent with the classification procedure, pronunciation variation is modelled implicitly. It is shown that the use of single pronunciation dictionaries provides similar or better word error rate performance, achieved both on Wall Street Journal and Switchboard data. The use of single pronunciation dictionaries in conjunction with Hidden Model Sequence Models as an example of an implicit pronunciation modelling technique shows further improvements. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:171 / 188
页数:18
相关论文
共 50 条
  • [11] Automatic speech recognition: Reliability and pedagogical implications for teaching pronunciation
    Kim, IS
    EDUCATIONAL TECHNOLOGY & SOCIETY, 2006, 9 (01): : 322 - 334
  • [12] Improving English Pronunciation via Automatic Speech Recognition Technology
    Li, Meihui
    Han, Meiting
    Chen, Zejia
    Mo, Yiling
    Chen, Xiujuan
    Liu, Xiaobin
    2017 INTERNATIONAL SYMPOSIUM ON EDUCATIONAL TECHNOLOGY (ISET 2017), 2017, : 224 - 228
  • [13] ARABIC SPEECH PRONUNCIATION RECOGNITION AND CORRECTION USING AUTOMATIC SPEECH RECOGNIZER (ASR)
    Dahan, H. B.
    Mannan, A.
    INTED2012: INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE, 2012, : 4009 - 4016
  • [14] Improving English pronunciation via automatic speech recognition technology
    Liu, Xiaobin
    Xu, Manfei
    Li, Meihui
    Han, Meiting
    Chen, Zejia
    Mo, Yiling
    Chen, Xiujuan
    Liu, Minjia
    INTERNATIONAL JOURNAL OF INNOVATION AND LEARNING, 2019, 25 (02) : 126 - 140
  • [15] Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children
    Ahn, Taekyung
    Hong, Yeonjung
    Im, Younggon
    Kim, Do Hyung
    Kang, Dayoung
    Jeong, Joo Won
    Kim, Jae Won
    Kim, Min Jung
    Cho, Ah-Ra
    Nam, Hosung
    Jang, Dae-Hyun
    CLINICAL LINGUISTICS & PHONETICS, 2024,
  • [16] A systematic literature review of research on automatic speech recognition in EFL pronunciation
    Liu, Yao
    Ab Rahman, Faizahani Binti
    Zain, Farah Binti Mohamad
    COGENT EDUCATION, 2025, 12 (01):
  • [17] A NEW APPROACH TO SPEAKER ADAPTATION BY MODELING PRONUNCIATION IN AUTOMATIC SPEECH RECOGNITION
    SCHIEL, F
    SPEECH COMMUNICATION, 1993, 13 (3-4) : 281 - 286
  • [18] Automatic Pronunciation Scoring for Mandarin Proficiency Test based on Speech Recognition
    Liu, Yang
    Yang, Chunting
    Ma, Weifeng
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT UBIQUITOUS COMPUTING AND EDUCATION, 2009, : 168 - 171
  • [19] Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition
    Bouselmi, G.
    Fohr, D.
    Illina, I.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1209 - +
  • [20] Using accent-specific pronunciation modelling for robust speech recognition
    Humphries, JJ
    Woodland, PC
    Pearce, D
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2324 - 2327