Continuous speech recognition based on general factor dependent acoustic models

被引:4
|
作者
Suzuki, H [1 ]
Zen, H
Nankaku, Y
Miyajima, C
Tokuda, K
Kitamura, T
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan
[2] Nagoya Univ, Dept Media Sci, Nagoya, Aichi 4668603, Japan
来源
关键词
continuous speech recognition; triphone HMMs; context clustering; Bayesian networks; voice characteristic; noise environment;
D O I
10.1093/ietisy/e88-d.3.410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes continuous speech recognition incorporating the additional complement information, e.g., voice characteristics, speaking styles, linguistic information and,noise environment, into HMM-based acoustic modeling. In speech recognition systems, context-dependent HMMs, i.e., triphone, and the tree-based context clustering have commonly been used. Several attempts to utilize not only phonetic contexts, but additional complement information based on context (factor) dependent HMMs have been made in recent years. However, when the additional factors for testing data are unobserved, methods for obtaining factor labels is required before decoding. In this paper, we propose a model integration technique based on general factor dependent HMMs for decoding. The integrated HMMs can be used by a conventional decoder as standard triphone HMMs with Gaussian mixture densities. Moreover, by using the results of context clustering, the proposed method can determine an optimal number of mixture components for each state dependently of the degree of influence from additional factors. Phoneme recognition experiments using voice characteristic labels show significant improvements with a small number of model parameters, and a 19.3% error reduction was obtained in noise environment experiments.
引用
收藏
页码:410 / 417
页数:8
相关论文
共 50 条
  • [21] Conversion from Phoneme Based to Grapheme Based Acoustic Models for Speech Recognition
    Zgank, Andrej
    Kacic, Zdravko
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1587 - 1590
  • [22] Multilingual acoustic models for speech recognition and synthesis
    Kunzmann, S
    Fischer, V
    Gonzalez, J
    Emam, O
    Günther, C
    Janke, E
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 745 - 748
  • [23] Dynamically configurable acoustic models for speech recognition
    Hwang, MY
    Huang, XD
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 669 - 672
  • [24] Compact Acoustic Models for Embedded Speech Recognition
    Levy, Christophe
    Linares, Georges
    Bonastre, Jean-Francois
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
  • [25] Acoustic-to-Phrase Models for Speech Recognition
    Gaur, Yashesh
    Li, Jinyu
    Meng, Zhong
    Gong, Yifan
    INTERSPEECH 2019, 2019, : 2240 - 2244
  • [26] Compact Acoustic Models for Embedded Speech Recognition
    Christophe Lévy
    Georges Linarès
    Jean-François Bonastre
    EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [28] CONTINUOUS SPEECH RECOGNITION VIA CENTISECOND ACOUSTIC STATES
    BAKIS, R
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1976, 59 : S97 - S97
  • [29] A study on continuous Chinese speech recognition based on stochastic trajectory models
    Ma, XH
    Gong, YF
    Fu, YQ
    Lu, J
    Haton, JP
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 482 - 485
  • [30] Graphical Models for the Recognition of Arabic continuous speech based Triphones modeling
    Zarrouk, Elyes
    Benayed, Yassine
    Gargouri, Faiez
    2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 603 - 608