Implicit modelling of pronunciation variation in automatic speech recognition

被引:27
|
作者
Hain, T [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
automatic speech recognition; pronunciation modelling; acoustic modelling; hidden markov models; pronunciation dictionaries; single pronunciations; parameter tying; phonetic decision trees; state clustering; conversational speech recognition; Hidden Model Sequence Models;
D O I
10.1016/j.specom.2005.03.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modelling of pronunciation variability is an important task for the acoustic model of an automatic speech recognition system. Good pronunciation models contribute to the robustness and generic applicability of a speech recogniser. Usually pronunciation modelling is associated with a lexicon that allows to explicitly control the selection of appropriate HMMs for a particular word. However, the use of data-driven clustering techniques or specific parameter tying techniques has considerable impact on this form of model selection and the construction of a task-optimal dictionary. Most large vocabulary speech recognition systems make use of a dictionary with multiple possible pronunciation variants per word. By manual addition of pronunciation variants explicit human knowledge is used in the recognition process. For reasons of complexity the optimisation of manual entries for performance is often not feasible. In this paper a method for the stepwise reduction of the number of pronunciation variants per word to one is described. By doing so in a way consistent with the classification procedure, pronunciation variation is modelled implicitly. It is shown that the use of single pronunciation dictionaries provides similar or better word error rate performance, achieved both on Wall Street Journal and Switchboard data. The use of single pronunciation dictionaries in conjunction with Hidden Model Sequence Models as an example of an implicit pronunciation modelling technique shows further improvements. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:171 / 188
页数:18
相关论文
共 50 条
  • [41] Pronunciation modelling for conversational speech recognition: A status report from WS97
    Byrne, B
    Finke, M
    Khudanpur, S
    McDonough, J
    Nock, H
    Riley, M
    Saraclar, M
    Wooters, C
    Zavaliagkos, G
    1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 26 - 33
  • [42] Connected phoneme HMMs with implicit duration modelling for better speech recognition
    Ramachandrula, S
    Thippur, S
    ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 1024 - 1028
  • [43] Automatic speech recognition and text-to-speech technologies for L2 pronunciation improvement: reflections on their affordances
    Gottardi, William
    de Almeida, Janaina Fernanda
    Soufen Tumolo, Celso Henrique
    TEXTO LIVRE-LINGUAGEM E TECNOLOGIA, 2022, 15
  • [44] MLLR/MAP Adaptation Using Pronunciation Variation for Non-native Speech Recognition
    Oh, Yoo Rhee
    Kim, Hong Kook
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 216 - 221
  • [45] Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: cumulating speech resources in a pluricentric language
    Wei, X.
    Cucchiarini, C.
    van Hout, R.
    Strik, H.
    SPEECH COMMUNICATION, 2022, 144 : 1 - 9
  • [46] Maximum likelihood modelling of pronunciation variation
    Holter, T
    Svendsen, T
    SPEECH COMMUNICATION, 1999, 29 (2-4) : 177 - 191
  • [47] Modeling Dialectal Variation for Swiss German Automatic Speech Recognition
    Khosravani, Abbas
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2021, 2021, : 2896 - 2900
  • [48] Using Automatic Speech Recognition to Facilitate English Pronunciation Assessment and Learning in an EFL Context: Pronunciation Error Diagnosis and Pedagogical Implications
    Xiao, Wenqi
    Park, Moonyoung
    INTERNATIONAL JOURNAL OF COMPUTER-ASSISTED LANGUAGE LEARNING AND TEACHING, 2021, 11 (03) : 74 - 91
  • [49] AUTOMATIC EVALUATION OF THE PRONUNCIATION WITH CALL-SLT, A CONVERSATION PARTNER EXCLUSIVELY BASED ON SPEECH RECOGNITION
    Eichenberger, F.
    Bouillon, P.
    Gerlach, J.
    Dejos, M.
    EDULEARN18: 10TH INTERNATIONAL CONFERENCE ON EDUCATION AND NEW LEARNING TECHNOLOGIES, 2018, : 6592 - 6597
  • [50] Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model
    Hoesen, Devin
    Putri, Fanda Yuliana
    Lestari, Dessi Puji
    2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 7 - 12