Decision tree-based acoustic models for speech recognition

被引:0
|
作者
Masami Akamine
Jitendra Ajmera
机构
[1] Toshiba Corporate R&D Center,
[2] IBM Research Lab.,undefined
关键词
speech recognition; acoustic modeling; decision trees; probability estimation; likelihood computation;
D O I
暂无
中图分类号
学科分类号
摘要
This article proposes a new acoustic model using decision trees (DTs) as replacements for Gaussian mixture models (GMM) to compute the observation likelihoods for a given hidden Markov model state in a speech recognition system. DTs have a number of advantageous properties, such as that they do not impose restrictions on the number or types of features, and that they automatically perform feature selection. This article explores and exploits DTs for the purpose of large vocabulary speech recognition. Equal and decoding questions have newly been introduced into DTs to directly model gender- and context-dependent acoustic space. Experimental results for the 5k ARPA wall-street-journal task show that context information significantly improves the performance of DT-based acoustic models as expected. Context-dependent DT-based models are highly compact compared to conventional GMM-based acoustic models. This means that the proposed models have effective data-sharing across various context classes.
引用
收藏
相关论文
共 50 条
  • [1] Decision tree-based acoustic models for speech recognition
    Akamine, Masami
    Ajmera, Jitendra
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,
  • [2] Decision Tree-Based Acoustic Models for Speech Recognition with Improved Smoothness
    Akamine, Masami
    Ajmera, Jitendra
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (11): : 2250 - 2258
  • [3] Triphone models for mandarin speech recognition based on decision tree
    Gao, Sheng
    Xu, Bo
    Huang, Taiyi
    Shengxue Xuebao/Acta Acustica, 2000, 25 (06): : 504 - 509
  • [4] Tree-based Context Clustering Using Speech Recognition Features for Acoustic Model Training of Speech Synthesis
    Chanjaradwichai, Supadaech
    Suchato, Atiwong
    Punyabukkana, Proadpran
    2015 12TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2015,
  • [5] Decision tree-based context dependent sublexical units for Continuous Speech Recognition of Basque
    de Ipiña, KL
    Graña, M
    Ezeiza, N
    Hernández, M
    Zulueta, E
    Ezeiza, A
    PROGRESS IN PATTERN RECOGNITION, SPEECH AND IMAGE ANALYSIS, 2003, 2905 : 259 - 265
  • [6] Tree-Based Estimation of Speaker Characteristics for Speech Recognition
    Blomberg, Mats
    Elenius, Daniel
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 584 - 587
  • [7] Decision Tree-based Training of Probabilistic Concatenation Models for Corpus-based Speech Synthesis
    Sakai, Shinsuke
    Kawahara, Tatsuya
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1746 - 1749
  • [8] Decision Tree-Based Adaptive Modulation for Underwater Acoustic Communications
    Pelekanakis, Konstantinos
    Cazzanti, Luca
    Zappa, Giovanni
    Alves, Joao
    2016 IEEE THIRD UNDERWATER COMMUNICATIONS AND NETWORKING CONFERENCE (UCOMMS), 2016,
  • [9] Face Recognition with Decision Tree-Based Local Binary Patterns
    Maturana, Daniel
    Mery, Domingo
    Soto, Alvaro
    COMPUTER VISION - ACCV 2010, PT IV, 2011, 6495 : 618 - 629
  • [10] Speech recognition with speech synthesis models by marginalising over decision tree leaves
    Dines, John
    Saheer, Lakshmi
    Liang, Hui
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1423 - 1426