Continuous speech recognition based on general factor dependent acoustic models

被引:4
|
作者
Suzuki, H [1 ]
Zen, H
Nankaku, Y
Miyajima, C
Tokuda, K
Kitamura, T
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan
[2] Nagoya Univ, Dept Media Sci, Nagoya, Aichi 4668603, Japan
来源
关键词
continuous speech recognition; triphone HMMs; context clustering; Bayesian networks; voice characteristic; noise environment;
D O I
10.1093/ietisy/e88-d.3.410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes continuous speech recognition incorporating the additional complement information, e.g., voice characteristics, speaking styles, linguistic information and,noise environment, into HMM-based acoustic modeling. In speech recognition systems, context-dependent HMMs, i.e., triphone, and the tree-based context clustering have commonly been used. Several attempts to utilize not only phonetic contexts, but additional complement information based on context (factor) dependent HMMs have been made in recent years. However, when the additional factors for testing data are unobserved, methods for obtaining factor labels is required before decoding. In this paper, we propose a model integration technique based on general factor dependent HMMs for decoding. The integrated HMMs can be used by a conventional decoder as standard triphone HMMs with Gaussian mixture densities. Moreover, by using the results of context clustering, the proposed method can determine an optimal number of mixture components for each state dependently of the degree of influence from additional factors. Phoneme recognition experiments using voice characteristic labels show significant improvements with a small number of model parameters, and a 19.3% error reduction was obtained in noise environment experiments.
引用
收藏
页码:410 / 417
页数:8
相关论文
共 50 条
  • [41] Online Generation of Acoustic Models for Multilingual Speech Recognition
    Raab, Martin
    Aradilla, Guillermo
    Gruhn, Rainer
    Noeth, Elmar
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2979 - +
  • [42] Acoustic Nudging-Based Model for Vocabulary Reformulation in Continuous Yoruba Speech Recognition
    Ajayi, Lydia Kehinde
    Azeta, Ambrose
    Odun-Ayo, Isaac
    Aniemeka, Enem Theophilus
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2022, PT I, 2022, 13375 : 494 - 508
  • [43] Boosting acoustic models in large vocabulary speech recognition
    Meyer, C
    Schramm, H
    PROCEEDINGS OF THE SIXTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2004, : 255 - 260
  • [44] Efficient Sparse Banded Acoustic Models for Speech Recognition
    Zhang, Weibin
    Fung, Pascale
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 280 - 283
  • [45] PHMM BASED ASYNCHRONOUS ACOUSTIC MODEL FOR CHINESE LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Wu, Hao
    Wu, Xihong
    Chi, Huisheng
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4477 - 4480
  • [46] Improving Acoustic Models for Russian Spontaneous Speech Recognition
    Prudnikov, Alexey
    Medennikov, Ivan
    Mendelev, Valentin
    Korenevsky, Maxim
    Khokhlov, Yuri
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 234 - 242
  • [47] Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition
    Abdelaziz, Ahmed Hussen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 475 - 484
  • [48] Continuous Speech Recognition with a TF-IDF Acoustic Model
    Zweig, Geoffrey
    Patrick Nguyen
    Droppo, Jasha
    Acero, Alex
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2858 - 2861
  • [49] Continuous speech recognition with a TF-IDF acoustic model
    Microsoft Research, Redmond, WA, United States
    Proc. Annu. Conf. Int. Speech Commun. Assoc., INTERSPEECH, (2854-2857):
  • [50] RECOGNITION OF SPEAKER-DEPENDENT CONTINUOUS SPEECH WITH KEAL
    MERCIER, G
    BIGORGNE, D
    MICLET, L
    LEGUENNEC, L
    QUERRE, M
    IEE PROCEEDINGS-I COMMUNICATIONS SPEECH AND VISION, 1989, 136 (02): : 145 - 154