Decision tree-based acoustic models for speech recognition

被引：0

作者：

Masami Akamine

Jitendra Ajmera

机构：

[1] Toshiba Corporate R&D Center,

[2] IBM Research Lab.,undefined

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2012卷

关键词：

speech recognition; acoustic modeling; decision trees; probability estimation; likelihood computation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This article proposes a new acoustic model using decision trees (DTs) as replacements for Gaussian mixture models (GMM) to compute the observation likelihoods for a given hidden Markov model state in a speech recognition system. DTs have a number of advantageous properties, such as that they do not impose restrictions on the number or types of features, and that they automatically perform feature selection. This article explores and exploits DTs for the purpose of large vocabulary speech recognition. Equal and decoding questions have newly been introduced into DTs to directly model gender- and context-dependent acoustic space. Experimental results for the 5k ARPA wall-street-journal task show that context information significantly improves the performance of DT-based acoustic models as expected. Context-dependent DT-based models are highly compact compared to conventional GMM-based acoustic models. This means that the proposed models have effective data-sharing across various context classes.

引用

共 50 条

[1] Decision tree-based acoustic models for speech recognition
Akamine, Masami
Ajmera, Jitendra
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,
[2] Decision Tree-Based Acoustic Models for Speech Recognition with Improved Smoothness
Akamine, Masami
Ajmera, Jitendra
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (11): : 2250 - 2258
[3] Triphone models for mandarin speech recognition based on decision tree
Gao, Sheng
Xu, Bo
Huang, Taiyi
Shengxue Xuebao/Acta Acustica, 2000, 25 (06): : 504 - 509
[4] Tree-based Context Clustering Using Speech Recognition Features for Acoustic Model Training of Speech Synthesis
Chanjaradwichai, Supadaech
Suchato, Atiwong
Punyabukkana, Proadpran
2015 12TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2015,
[5] Decision tree-based context dependent sublexical units for Continuous Speech Recognition of Basque
de Ipiña, KL
Graña, M
Ezeiza, N
Hernández, M
Zulueta, E
Ezeiza, A
PROGRESS IN PATTERN RECOGNITION, SPEECH AND IMAGE ANALYSIS, 2003, 2905 : 259 - 265
[6] Tree-Based Estimation of Speaker Characteristics for Speech Recognition
Blomberg, Mats
Elenius, Daniel
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 584 - 587
[7] Decision Tree-based Training of Probabilistic Concatenation Models for Corpus-based Speech Synthesis
Sakai, Shinsuke
Kawahara, Tatsuya
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1746 - 1749
[8] Decision Tree-Based Adaptive Modulation for Underwater Acoustic Communications
Pelekanakis, Konstantinos
Cazzanti, Luca
Zappa, Giovanni
Alves, Joao
2016 IEEE THIRD UNDERWATER COMMUNICATIONS AND NETWORKING CONFERENCE (UCOMMS), 2016,
[9] Face Recognition with Decision Tree-Based Local Binary Patterns
Maturana, Daniel
Mery, Domingo
Soto, Alvaro
COMPUTER VISION - ACCV 2010, PT IV, 2011, 6495 : 618 - 629
[10] Speech recognition with speech synthesis models by marginalising over decision tree leaves
Dines, John
Saheer, Lakshmi
Liang, Hui
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1423 - 1426

← 1 2 3 4 5 →