Continuous speech recognition based on general factor dependent acoustic models

被引:4
|
作者
Suzuki, H [1 ]
Zen, H
Nankaku, Y
Miyajima, C
Tokuda, K
Kitamura, T
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan
[2] Nagoya Univ, Dept Media Sci, Nagoya, Aichi 4668603, Japan
来源
关键词
continuous speech recognition; triphone HMMs; context clustering; Bayesian networks; voice characteristic; noise environment;
D O I
10.1093/ietisy/e88-d.3.410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes continuous speech recognition incorporating the additional complement information, e.g., voice characteristics, speaking styles, linguistic information and,noise environment, into HMM-based acoustic modeling. In speech recognition systems, context-dependent HMMs, i.e., triphone, and the tree-based context clustering have commonly been used. Several attempts to utilize not only phonetic contexts, but additional complement information based on context (factor) dependent HMMs have been made in recent years. However, when the additional factors for testing data are unobserved, methods for obtaining factor labels is required before decoding. In this paper, we propose a model integration technique based on general factor dependent HMMs for decoding. The integrated HMMs can be used by a conventional decoder as standard triphone HMMs with Gaussian mixture densities. Moreover, by using the results of context clustering, the proposed method can determine an optimal number of mixture components for each state dependently of the degree of influence from additional factors. Phoneme recognition experiments using voice characteristic labels show significant improvements with a small number of model parameters, and a 19.3% error reduction was obtained in noise environment experiments.
引用
收藏
页码:410 / 417
页数:8
相关论文
共 50 条
  • [11] Investigations on Features for Log-Linear Acoustic Models in Continuous Speech Recognition
    Wiesler, S.
    Nussbaum-Thom, M.
    Heigold, G.
    Schlueter, R.
    Ney, H.
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 52 - 57
  • [12] Trajectory clustering of syllable-length acoustic models for continuous speech recognition
    Han, Yan
    Hamalainen, Annika
    Boves, Lou
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1169 - 1172
  • [13] Investigations of Issues for Using Multiple Acoustic Models to Improve Continuous Speech Recognition
    Zhang, Rong
    Rudnicky, Alexander I.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 529 - 532
  • [14] DISCRIMINATIVE TRAINING OF HIERARCHICAL ACOUSTIC MODELS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Chang, Hung-An
    Glass, James R.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4481 - 4484
  • [15] Interpolation of Acoustic Models for Speech Recognition
    Fraga-Silva, Thiago
    Gauvain, Jean-Luc
    Lamel, Lori
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3346 - 3350
  • [16] Decision tree-based acoustic models for speech recognition
    Masami Akamine
    Jitendra Ajmera
    EURASIP Journal on Audio, Speech, and Music Processing, 2012
  • [17] Decision tree-based acoustic models for speech recognition
    Akamine, Masami
    Ajmera, Jitendra
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,
  • [18] First Automatic Fongbe Continuous Speech Recognition System: Development of Acoustic Models and Language Models
    LAleye, Frejus A. A.
    Besacier, Laurent
    Ezin, Eugene C.
    Motamed, Cina
    PROCEEDINGS OF THE 2016 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2016, 8 : 477 - 482
  • [19] Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition
    Pylkkonen, Janne
    Kurimo, Mikko
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1210 - 1213
  • [20] HYBRID DNN-LATENT STRUCTURED SVM ACOUSTIC MODELS FOR CONTINUOUS SPEECH RECOGNITION
    Ravuri, Suman
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 37 - 44