MDL-based context-dependent subword modeling for speech recognition

被引:0
|
作者
Shinoda, Koichi [1 ]
Watanabe, Takao [1 ]
机构
[1] NEC Corp, Kawasaki, Japan
关键词
Markov processes - Mathematical models - Maximum likelihood estimation - Pattern recognition systems - Speech analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Context-dependent phone units, such as triphones, have recently come to be used to model subword units in speech recognition systems that are based on the use of hidden Markov models (HMMs). While most such systems employ clustering of the HMM parameters (e.g., subword clustering and state clustering) to control the HMM size, so as to avoid poor recognition accuracy due to a lack of training data, none of them provide any effective criteria for determining the optimal number of clusters. This paper proposes a method in which state clustering is accomplished by way of phonetic decision trees and in which the minimum description length (MDL) criterion is used to optimize the number of clusters. Large-vocabulary Japanese-language recognition experiments show that this method achieves higher accuracy than the maximum-likelihood approach.
引用
收藏
页码:79 / 86
相关论文
共 50 条
  • [21] Context-dependent duration modeling
    Willett, D
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 421 - 424
  • [22] Generalized Context Modeling With Multi-Directional Structuring and MDL-Based Model Selection for Heterogeneous Data Compression
    Dai, Wenrui
    Xiong, Hongkai
    Wang, Jia
    Cheng, Samuel
    Zheng, Yuan F.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (21) : 5650 - 5664
  • [23] Research on context-dependent acoustical unit (triphone) for mandarin continuous speech recognition
    Tsinghua Univ, Beijing, China
    Tien Tzu Hsueh Pao, 6 (79-82, 117):
  • [24] LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION WITH CONTEXT-DEPENDENT DBN-HMMS
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4688 - 4691
  • [25] Research on context-dependent acoustical unit (triphone) for mandarin continuous speech recognition
    Zhao, Qingwei
    Wang, Zuoying
    Lu, Dajin
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 1999, 27 (06): : 79 - 82
  • [26] Context-Dependent Deep Neural Networks for Commercial Mandarin Speech Recognition Applications
    Niu, Jianwei
    Xie, Lei
    Jia, Lei
    Hu, Na
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [27] FROM SENONES TO CHENONES: TIED CONTEXT-DEPENDENT GRAPHEMES FOR HYBRID SPEECH RECOGNITION
    Le, Duc
    Zhang, Xiaohui
    Zheng, Weiyi
    Fugen, Christian
    Zweig, Geoffrey
    Seltzer, Michael L.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 457 - 464
  • [28] CONTEXT-DEPENDENT WORD DURATION MODELING FOR KOREAN CONNECTED DIGIT RECOGNITION
    KWON, OW
    UN, CK
    ELECTRONICS LETTERS, 1995, 31 (19) : 1630 - 1631
  • [29] Context-Dependent Object Proposal and Recognition
    Chang, Ray-, I
    Ting, Chao-Lung
    Wu, Syuan-Yi
    Yin, Peng-Yeng
    SYMMETRY-BASEL, 2020, 12 (10): : 1 - 20
  • [30] REFINEMENTS OF REGRESSION-BASED CONTEXT-DEPENDENT MODELLING OF DEEP NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
    Wang, Guangsen
    Sim, Khe Chai
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,