Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

被引:12
|
作者
Wang, Guangsen [1 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Sch Comp, Singapore 117417, Singapore
基金
新加坡国家研究基金会;
关键词
Articulatory features; context dependent modeling; deep neural network; logistic regression; HIDDEN MARKOV-MODELS;
D O I
10.1109/TASLP.2014.2344855
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The data sparsity problem is addressed by using the decision tree state clusters as the training targets for the state-of-the-art context-dependent (CD) deep neural network (DNN) systems. The CD states within a cluster cannot be distinguished at the frame level. We surmise that the state clustering may cause an issue for the standard CD-DNNs, which has so far not been addressed in the literature. In this paper, a logistic regression framework is proposed for the CD-DNNs based on a set of broad phone classes to address both the data sparsity and the clustering problems. To address the data sparsity issue, the triphones are clustered into shorter biphones with broad phone contexts under multiple articulatory categories. A DNN is trained to discriminate the disjoint biphone clusters within each articulatory category. The regression bases are formed by the concatenated log posterior probabilities of all the broad phone DNNs. Logistic regression is used to transform the regression bases into the triphone state posteriors. Clustering of the regression parameters is used to reduce the regression model complexity while still achieving unique acoustic scores for all possible triphones. Based on some approximations, the regression model can be trained as a sparse softmax layer and its parameters can be learned by optimizing the cross-entropy criterion. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD-DNN significantly outperforms the standard CD-DNN. The best system provides a 1.3% absolute word error rate reduction compared to the best standard CD-DNN system.
引用
收藏
页码:1660 / 1669
页数:10
相关论文
共 50 条
  • [1] REFINEMENTS OF REGRESSION-BASED CONTEXT-DEPENDENT MODELLING OF DEEP NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
    Wang, Guangsen
    Sim, Khe Chai
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] ADAPTATION OF CONTEXT-DEPENDENT DEEP NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
    Yao, Kaisheng
    Yu, Dong
    Seide, Frank
    Su, Hang
    Deng, Li
    Gong, Yifan
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 366 - 369
  • [3] Context-Dependent Deep Neural Networks for Commercial Mandarin Speech Recognition Applications
    Niu, Jianwei
    Xie, Lei
    Jia, Lei
    Hu, Na
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [4] Conversational Speech Transcription Using Context-Dependent Deep Neural Networks
    Seide, Frank
    Li, Gang
    Yu, Dong
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 444 - +
  • [5] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
  • [6] MDL-based context-dependent subword modeling for speech recognition
    Shinoda, Koichi
    Watanabe, Takao
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 2000, 21 (02): : 79 - 86
  • [7] A frame-based context-dependent acoustic modeling for speech recognition
    Terashima R.
    Zen H.
    Nankaku Y.
    Tokuda K.
    IEEJ Transactions on Electronics, Information and Systems, 2010, 130 (10) : 1856 - 1864+24
  • [8] Deep Neural Networks for Context-Dependent Deep Brain Stimulation
    Haddock, Andrew
    Chizeck, Howard J.
    Ko, Andrew L.
    2019 9TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING (NER), 2019, : 957 - 960
  • [9] Full expansion of context-dependent networks in large vocabulary speech recognition
    Mohri, M
    Riley, M
    Hindle, D
    Ljolje, A
    Pereira, F
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 665 - 668
  • [10] CONTEXT-DEPENDENT MODELLING OF DEEP NEURAL NETWORK USING LOGISTIC REGRESSION
    Wang, Guangsen
    Sim, Khe Chai
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 338 - 343