MDL-based context-dependent subword modeling for speech recognition

被引:0
|
作者
Shinoda, Koichi [1 ]
Watanabe, Takao [1 ]
机构
[1] NEC Corp, Kawasaki, Japan
关键词
Markov processes - Mathematical models - Maximum likelihood estimation - Pattern recognition systems - Speech analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Context-dependent phone units, such as triphones, have recently come to be used to model subword units in speech recognition systems that are based on the use of hidden Markov models (HMMs). While most such systems employ clustering of the HMM parameters (e.g., subword clustering and state clustering) to control the HMM size, so as to avoid poor recognition accuracy due to a lack of training data, none of them provide any effective criteria for determining the optimal number of clusters. This paper proposes a method in which state clustering is accomplished by way of phonetic decision trees and in which the minimum description length (MDL) criterion is used to optimize the number of clusters. Large-vocabulary Japanese-language recognition experiments show that this method achieves higher accuracy than the maximum-likelihood approach.
引用
收藏
页码:79 / 86
相关论文
共 50 条
  • [31] ENVIRONMENTAL CONTEXT-DEPENDENT EYEWITNESS RECOGNITION
    SMITH, SM
    VELA, E
    APPLIED COGNITIVE PSYCHOLOGY, 1992, 6 (02) : 125 - 139
  • [32] Context-Dependent Logo Matching and Recognition
    Sahbi, Hichem
    Ballan, Lamberto
    Serra, Giuseppe
    Del Bimbo, Alberto
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (03) : 1018 - 1031
  • [33] Context-dependent HMM modeling using tree-based clustering for the recognition of handwritten words
    Bianne, Anne-Laure
    Kermorvant, Christopher
    Likforman-Sulem, Laurence
    DOCUMENT RECOGNITION AND RETRIEVAL XVII, 2010, 7534
  • [34] Human Action Recognition Based on Context-Dependent Graph Kernels
    Wu, Baoxin
    Yuan, Chunfeng
    Hu, Weiming
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2609 - 2616
  • [35] The use of subword linguistic modeling for multiple tasks in speech recognition
    Seneff, S
    SPEECH COMMUNICATION, 2004, 42 (3-4) : 373 - 390
  • [36] Atypical context-dependent speech processing in autism
    Yu, Alan Chi Lun
    To, Carol Kit Sum
    APPLIED PSYCHOLINGUISTICS, 2020, 41 (05) : 1045 - 1059
  • [37] Context-dependent probability adaptation in speech understanding
    Drenth, EW
    Ruber, B
    COMPUTER SPEECH AND LANGUAGE, 1997, 11 (03): : 225 - 252
  • [38] Single channel speech enhancement using MDL-based subspace approach in Bark domain
    Vetter, R
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 641 - 644
  • [39] Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis
    Khorram, Soheil
    Sameti, Hossein
    Bahmaninezhad, Fahimeh
    King, Simon
    Drugman, Thomas
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
  • [40] Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis
    Soheil Khorram
    Hossein Sameti
    Fahimeh Bahmaninezhad
    Simon King
    Thomas Drugman
    EURASIP Journal on Audio, Speech, and Music Processing, 2014