Continuous speech recognition based on general factor dependent acoustic models

被引:4
|
作者
Suzuki, H [1 ]
Zen, H
Nankaku, Y
Miyajima, C
Tokuda, K
Kitamura, T
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan
[2] Nagoya Univ, Dept Media Sci, Nagoya, Aichi 4668603, Japan
来源
关键词
continuous speech recognition; triphone HMMs; context clustering; Bayesian networks; voice characteristic; noise environment;
D O I
10.1093/ietisy/e88-d.3.410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes continuous speech recognition incorporating the additional complement information, e.g., voice characteristics, speaking styles, linguistic information and,noise environment, into HMM-based acoustic modeling. In speech recognition systems, context-dependent HMMs, i.e., triphone, and the tree-based context clustering have commonly been used. Several attempts to utilize not only phonetic contexts, but additional complement information based on context (factor) dependent HMMs have been made in recent years. However, when the additional factors for testing data are unobserved, methods for obtaining factor labels is required before decoding. In this paper, we propose a model integration technique based on general factor dependent HMMs for decoding. The integrated HMMs can be used by a conventional decoder as standard triphone HMMs with Gaussian mixture densities. Moreover, by using the results of context clustering, the proposed method can determine an optimal number of mixture components for each state dependently of the degree of influence from additional factors. Phoneme recognition experiments using voice characteristic labels show significant improvements with a small number of model parameters, and a 19.3% error reduction was obtained in noise environment experiments.
引用
收藏
页码:410 / 417
页数:8
相关论文
共 50 条
  • [1] Large Vocabulary Continuous Speech Recognition With Reservoir-Based Acoustic Models
    Triefenbach, Fabian
    Demuynck, Kris
    Martens, Jean-Pierre
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 311 - 315
  • [2] Context-dependent acoustic models for Chinese speech recognition
    Ma, B
    Huang, TY
    Xu, B
    Zhang, XJ
    Qu, F
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 455 - 458
  • [3] Phone-context specific gender-dependent acoustic-models for continuous speech recognition
    Neti, C
    Roukos, S
    1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 192 - 198
  • [4] Context Dependent Syllable Acoustic Model for Continuous Chinese Speech Recognition
    Wu, Hao
    Wu, Xihong
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1961 - 1964
  • [5] Acoustic models of the elderly for large-vocabulary continuous speech recognition
    Baba, A
    Yoshizawa, S
    Yamada, M
    Lee, A
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2004, 87 (07): : 49 - 57
  • [6] Development & evaluation of different acoustic models for Malayalam continuous speech recognition
    Kurian, Cini
    Balakrishnan, Kannan
    INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY AND SYSTEM DESIGN 2011, 2012, 30 : 1081 - 1088
  • [7] Unsupervised training of acoustic models for large vocabulary continuous speech recognition
    Wessel, F
    Ney, H
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 307 - 310
  • [8] Speech recognition using voice-characteristic-dependent acoustic models
    Suzuki, H
    Zen, H
    Nankaku, Y
    Miyajima, C
    Tokuda, K
    Kitamura, T
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 740 - 743
  • [9] Context dependent initial/final acoustic modeling for continuous Chinese speech recognition
    Li, Jing
    Zheng, Fang
    Zhang, Jiyong
    Wu, Wenhu
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2004, 44 (01): : 61 - 64
  • [10] Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish
    Vanhainen, Niklas
    Salvi, Giampiero
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,