A maximum model distance approach for HMM-based speech recognition

被引:12
|
作者
Kwong, S [1 ]
He, QH
Man, KF
Tang, KS
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong
[2] S China Univ Technol, Dept Elect Engn, Guangzhou, Peoples R China
[3] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Hong Kong
关键词
hidden Markov mode; maximum likelihood; corrective training; speech recognition; stochastic process;
D O I
10.1016/S0031-3203(97)00042-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new approach for HMM-training which is based on the maximum model distance (MMD) criterion for different similar utterances. This approach differs from the traditional maximum likelihood (ML) approach in that the ML only considers the likelihood P(O-nu \ lambda(nu)) for a single utterance, while the MMD compares the likelihood P(O-nu \ lambda(nu)) against those similar utterances and maximizes their likelihood differences. Theoretical and practical issues concerning this approach are investigated. In addition, the corrective training [Bahl, Brown, de Souza and Mercer, IEEE Trans. Speech Audio Process. 1(1), (1993)] of the MMD was also included in this paper and we proved that the corrective training proposed by Bahl et al. (1993) is a special case of our MMD approach. Both speaker-dependent and multi-speaker experiments have bean carried out on the Chinese An-set syllables and also the 599 most common utterances from the TIMIT database. Experimental results showed that significant error reduction can be achieved through the proposed approach. (C) 1997 Pattern Recognition Society. Published by Elsevier Science Ltd.
引用
收藏
页码:219 / 229
页数:11
相关论文
共 50 条
  • [1] An improved maximum model distance approach for HMM-based speech recognition systems
    He, QH
    Kwong, S
    Man, KF
    Tang, KS
    PATTERN RECOGNITION, 2000, 33 (10) : 1749 - 1758
  • [2] Maximum likelihood linear transformations for HMM-based speech recognition
    Cambridge Univ Engineering Dep, Cambridge, United Kingdom
    Comput Speech Lang, 2 (75-98):
  • [3] Maximum likelihood linear transformations for HMM-based speech recognition
    Gales, MJF
    COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02): : 75 - 98
  • [4] An HMM-based speech recognition IC
    Han, W
    Hon, KW
    Chan, CF
    Lee, T
    Choy, CS
    Pun, KP
    Ching, PC
    PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II: COMMUNICATIONS-MULTIMEDIA SYSTEMS & APPLICATIONS, 2003, : 744 - 747
  • [5] From Stochastic Speech Recognition to Understanding: An HMM-Based approach
    Boda, PP
    1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 57 - 64
  • [6] HMM-based threshold model approach for gesture recognition
    Microsoft Korea, Seoul, Korea, Republic of
    IEEE Trans Pattern Anal Mach Intell, 10 (961-973):
  • [7] Peripheral features for HMM-based speech recognition
    Fukuda, T
    Takigawa, M
    Nitta, T
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 129 - 132
  • [8] An HMM-based threshold model approach for gesture recognition
    Lee, HK
    Kim, JH
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1999, 21 (10) : 961 - 973
  • [9] A BAYESIAN APPROACH TO HMM-BASED SPEECH SYNTHESIS
    Hashimoto, Kei
    Zen, Heiga
    Nankaku, Yoshihiko
    Masuko, Takashi
    Tokuda, Keiichi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4029 - +
  • [10] Use of voicing features in HMM-based speech recognition
    Thomson, DL
    Chengalvarayan, R
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 197 - 211