Continuous speech recognition based on general factor dependent acoustic models

被引:4
|
作者
Suzuki, H [1 ]
Zen, H
Nankaku, Y
Miyajima, C
Tokuda, K
Kitamura, T
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan
[2] Nagoya Univ, Dept Media Sci, Nagoya, Aichi 4668603, Japan
来源
关键词
continuous speech recognition; triphone HMMs; context clustering; Bayesian networks; voice characteristic; noise environment;
D O I
10.1093/ietisy/e88-d.3.410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes continuous speech recognition incorporating the additional complement information, e.g., voice characteristics, speaking styles, linguistic information and,noise environment, into HMM-based acoustic modeling. In speech recognition systems, context-dependent HMMs, i.e., triphone, and the tree-based context clustering have commonly been used. Several attempts to utilize not only phonetic contexts, but additional complement information based on context (factor) dependent HMMs have been made in recent years. However, when the additional factors for testing data are unobserved, methods for obtaining factor labels is required before decoding. In this paper, we propose a model integration technique based on general factor dependent HMMs for decoding. The integrated HMMs can be used by a conventional decoder as standard triphone HMMs with Gaussian mixture densities. Moreover, by using the results of context clustering, the proposed method can determine an optimal number of mixture components for each state dependently of the degree of influence from additional factors. Phoneme recognition experiments using voice characteristic labels show significant improvements with a small number of model parameters, and a 19.3% error reduction was obtained in noise environment experiments.
引用
收藏
页码:410 / 417
页数:8
相关论文
共 50 条
  • [31] A study on continuous Chinese speech recognition based on stochastic trajectory models
    MA Xiaohui(Department of Radio Engineering Southeast University Nanjing 210096)GONG Yifan(CRIN/CNRS France)FU Yuqing
    LU Jiren(Department of Radio Engineering Southeast University Nanjing 210096)
    ChineseJournalofAcoustics, 1997, (04) : 350 - 355
  • [32] MODELS OF CONTINUOUS SPEECH RECOGNITION AND THE CONTENTS OF THE VOCABULARY
    MCQUEEN, JM
    CUTLER, A
    BRISCOE, T
    NORRIS, D
    LANGUAGE AND COGNITIVE PROCESSES, 1995, 10 (3-4): : 309 - 331
  • [33] End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow
    Variani, Ehsan
    Bagby, Tom
    McDermott, Erik
    Bacchiani, Michiel
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1641 - 1645
  • [34] A frame-based context-dependent acoustic modeling for speech recognition
    Terashima R.
    Zen H.
    Nankaku Y.
    Tokuda K.
    IEEJ Transactions on Electronics, Information and Systems, 2010, 130 (10) : 1856 - 1864+24
  • [35] Comparison of Slovak and Czech Speech Recognition Based on Grapheme and Phoneme Acoustic Models
    Lihan, Slavomir
    Juhar, Jozef
    Cizmar, Anton
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 149 - 152
  • [36] DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION
    Ghorbani, Shahram
    Khorram, Soheil
    Hansen, John H. L.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 107 - 113
  • [37] MULTILINGUAL ACOUSTIC MODELING FOR SPEECH RECOGNITION BASED ON SUBSPACE GAUSSIAN MIXTURE MODELS
    Burget, Lukas
    Schwarz, Petr
    Agarwal, Mohit
    Akyazi, Pinar
    Feng, Kai
    Ghoshal, Arnab
    Glembek, Ondrej
    Goel, Nagendra
    Karafiat, Martin
    Povey, Daniel
    Rastrow, Ariya
    Rose, Richard C.
    Thomas, Samuel
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4334 - 4337
  • [38] Gated Recurrent Units Based Hybrid Acoustic Models for Robust Speech Recognition
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Jia
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [39] Decision Tree-Based Acoustic Models for Speech Recognition with Improved Smoothness
    Akamine, Masami
    Ajmera, Jitendra
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (11): : 2250 - 2258
  • [40] CONTEXT DEPENDENT STATE TYING FOR SPEECH RECOGNITION USING DEEP NEURAL NETWORK ACOUSTIC MODELS
    Bacchiani, Michiel
    Rybach, David
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,