Implicit modelling of pronunciation variation in automatic speech recognition

被引:27
|
作者
Hain, T [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
automatic speech recognition; pronunciation modelling; acoustic modelling; hidden markov models; pronunciation dictionaries; single pronunciations; parameter tying; phonetic decision trees; state clustering; conversational speech recognition; Hidden Model Sequence Models;
D O I
10.1016/j.specom.2005.03.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modelling of pronunciation variability is an important task for the acoustic model of an automatic speech recognition system. Good pronunciation models contribute to the robustness and generic applicability of a speech recogniser. Usually pronunciation modelling is associated with a lexicon that allows to explicitly control the selection of appropriate HMMs for a particular word. However, the use of data-driven clustering techniques or specific parameter tying techniques has considerable impact on this form of model selection and the construction of a task-optimal dictionary. Most large vocabulary speech recognition systems make use of a dictionary with multiple possible pronunciation variants per word. By manual addition of pronunciation variants explicit human knowledge is used in the recognition process. For reasons of complexity the optimisation of manual entries for performance is often not feasible. In this paper a method for the stepwise reduction of the number of pronunciation variants per word to one is described. By doing so in a way consistent with the classification procedure, pronunciation variation is modelled implicitly. It is shown that the use of single pronunciation dictionaries provides similar or better word error rate performance, achieved both on Wall Street Journal and Switchboard data. The use of single pronunciation dictionaries in conjunction with Hidden Model Sequence Models as an example of an implicit pronunciation modelling technique shows further improvements. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:171 / 188
页数:18
相关论文
共 50 条
  • [31] Pronunciation modelling using a hand-labelled corpus for conversational speech recognition
    Byrne, W
    Finke, M
    Khudanpur, S
    McDonough, J
    Nock, H
    Riley, M
    Saraclar, M
    Wooters, C
    Zavaliagkos, G
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 313 - 316
  • [32] Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights
    Adiga, Devaraja
    Kumar, Rishabh
    Krishna, Amrith
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Goyal, Pawan
    Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, : 5039 - 5050
  • [33] Probabilistic Pronunciation Variation Model Based on Bayesian Network for Conversational Speech Recognition
    Sakti, Sakriani
    Markov, Konstantin
    Nakamura, Satoshi
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 405 - 410
  • [34] Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights
    Adiga, Devaraja
    Kumar, Rishabh
    Krishna, Amrith
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Goyal, Pawan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 5039 - 5050
  • [35] Incorporating linguistic theories of pronunciation variation into speech-recognition models - Discussion
    Jones, KIBS
    Ostendorf, M
    Isard, S
    Janke, E
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2000, 358 (1769): : 1338 - 1338
  • [36] Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers
    Morales, Santiago Omar Caballero
    Cox, Stephen J.
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2009,
  • [37] Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers
    Santiago Omar Caballero Morales
    Stephen J. Cox
    EURASIP Journal on Advances in Signal Processing, 2009
  • [38] CONTEXTUALLY-BASED DATA-DERIVED PRONUNCIATION NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
    CHEN, FR
    SPEECH AND NATURAL LANGUAGE, 1989, : 374 - 380
  • [39] Monitoring student behavior in autonomous automatic speech recognition-based pronunciation practice
    Inceoglu, Solene
    Chen, Wen-Hsin
    Lim, Hyojung
    SYSTEM, 2024, 124
  • [40] Development of articulation training system with speech recognition based automatic pronunciation detection mechanism
    Chen, Yeou-Jiunn
    Huang, Jing-Wei
    3RD KUALA LUMPUR INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING 2006, 2007, 15 : 637 - +