Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

被引:12
|
作者
Wang, Guangsen [1 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Sch Comp, Singapore 117417, Singapore
基金
新加坡国家研究基金会;
关键词
Articulatory features; context dependent modeling; deep neural network; logistic regression; HIDDEN MARKOV-MODELS;
D O I
10.1109/TASLP.2014.2344855
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The data sparsity problem is addressed by using the decision tree state clusters as the training targets for the state-of-the-art context-dependent (CD) deep neural network (DNN) systems. The CD states within a cluster cannot be distinguished at the frame level. We surmise that the state clustering may cause an issue for the standard CD-DNNs, which has so far not been addressed in the literature. In this paper, a logistic regression framework is proposed for the CD-DNNs based on a set of broad phone classes to address both the data sparsity and the clustering problems. To address the data sparsity issue, the triphones are clustered into shorter biphones with broad phone contexts under multiple articulatory categories. A DNN is trained to discriminate the disjoint biphone clusters within each articulatory category. The regression bases are formed by the concatenated log posterior probabilities of all the broad phone DNNs. Logistic regression is used to transform the regression bases into the triphone state posteriors. Clustering of the regression parameters is used to reduce the regression model complexity while still achieving unique acoustic scores for all possible triphones. Based on some approximations, the regression model can be trained as a sparse softmax layer and its parameters can be learned by optimizing the cross-entropy criterion. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD-DNN significantly outperforms the standard CD-DNN. The best system provides a 1.3% absolute word error rate reduction compared to the best standard CD-DNN system.
引用
收藏
页码:1660 / 1669
页数:10
相关论文
共 50 条
  • [31] Acceleration Strategies for Speech Recognition based on Deep Neural Networks
    Tian, Chao
    Liu, Jia
    Peng, Zhaomeng
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 5181 - 5185
  • [32] Continual learning of context-dependent processing in neural networks
    Guanxiong Zeng
    Yang Chen
    Bo Cui
    Shan Yu
    Nature Machine Intelligence, 2019, 1 : 364 - 372
  • [33] Algorithm for mandarin continuous speech recognition based on context-dependent unit between syllables
    Tsinghua Univ, Beijing, China
    Qinghua Daxue Xuebao, 9 (65-68, 75):
  • [34] ERROR BACK PROPAGATION FOR SEQUENCE TRAINING OF CONTEXT-DEPENDENT DEEP NETWORKS FOR CONVERSATIONAL SPEECH TRANSCRIPTION
    Su, Hang
    Li, Gang
    Yu, Dong
    Seide, Frank
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6664 - 6668
  • [35] State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition
    Zhou, Pan
    Jiang, Hui
    Dai, Li-Rong
    Hu, Yu
    Liu, Qing-Feng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) : 631 - 642
  • [36] State-clustering based multiple deep neural networks modeling approach for speech recognition
    National Engineering Laboratory of Speech and Language Information Processing, University of Science and Technology of China, Hefei
    230026, China
    不详
    ON
    M3J1P3, Canada
    IEEE ACM Trans. Audio Speech Lang. Process., 4 (631-642):
  • [37] ACID/HNN: Clustering hierarchies of neural networks for context-dependent connectionist acoustic modeling
    Fritsch, J
    Finke, M
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 505 - 508
  • [38] Context-dependent units for vocabulary-independent Spanish speech recognition
    Villarrubia, L
    Gomez, LH
    Elvira, JM
    Torrecilla, JC
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 451 - 454
  • [39] Deep Neural Networks in Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    Karpov, Alexey
    Filchenkov, Andrey
    ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
  • [40] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Cai, Meng
    Shi, Yongzhe
    Liu, Jia
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296