Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

被引:12
|
作者
Wang, Guangsen [1 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Sch Comp, Singapore 117417, Singapore
基金
新加坡国家研究基金会;
关键词
Articulatory features; context dependent modeling; deep neural network; logistic regression; HIDDEN MARKOV-MODELS;
D O I
10.1109/TASLP.2014.2344855
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The data sparsity problem is addressed by using the decision tree state clusters as the training targets for the state-of-the-art context-dependent (CD) deep neural network (DNN) systems. The CD states within a cluster cannot be distinguished at the frame level. We surmise that the state clustering may cause an issue for the standard CD-DNNs, which has so far not been addressed in the literature. In this paper, a logistic regression framework is proposed for the CD-DNNs based on a set of broad phone classes to address both the data sparsity and the clustering problems. To address the data sparsity issue, the triphones are clustered into shorter biphones with broad phone contexts under multiple articulatory categories. A DNN is trained to discriminate the disjoint biphone clusters within each articulatory category. The regression bases are formed by the concatenated log posterior probabilities of all the broad phone DNNs. Logistic regression is used to transform the regression bases into the triphone state posteriors. Clustering of the regression parameters is used to reduce the regression model complexity while still achieving unique acoustic scores for all possible triphones. Based on some approximations, the regression model can be trained as a sparse softmax layer and its parameters can be learned by optimizing the cross-entropy criterion. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD-DNN significantly outperforms the standard CD-DNN. The best system provides a 1.3% absolute word error rate reduction compared to the best standard CD-DNN system.
引用
收藏
页码:1660 / 1669
页数:10
相关论文
共 50 条
  • [21] Modeling context-dependent phonetic units in a continuous speech recognition system for Mandarin Chinese
    Wu, JJX
    Deng, L
    Chan, J
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2281 - 2284
  • [22] Integration of context-dependent durational knowledge into HMM-based speech recognition
    Wang, X
    tenBosch, LFM
    Pols, LCW
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1073 - 1076
  • [23] SPEECH SEPARATION BASED ON SIGNAL-NOISE-DEPENDENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 61 - 65
  • [24] A Regression Approach to Speech Enhancement Based on Deep Neural Networks
    Xu, Yong
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 7 - 19
  • [25] Regression-Based Speech Enhancement by Convolutional Neural Network
    Erseven, Mustafa
    Bolat, Bulent
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [26] Analysis of context-dependent segmental duration for automatic speech recognition
    Wang, X
    Pols, LCW
    tenBosch, LFM
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1181 - 1184
  • [27] Regression-Based Noise Modeling for Speech Signal Processing
    de Abreu, Caio Cesar Enside
    Duarte, Marco Aparecido Queiroz
    de Oliveira, Bruno Rodrigues
    Vieira Filho, Jozue
    Villarreal, Francisco
    FLUCTUATION AND NOISE LETTERS, 2021, 20 (03):
  • [28] CONTEXT-DEPENDENT DEEP NEURAL NETWORKS FOR AUDIO INDEXING OF REAL-LIFE DATA
    Li, Gang
    Zhu, Huifeng
    Cheng, Gong
    Thambiratnam, Kit
    Chitsaz, Behrooz
    Yu, Dong
    Seide, Frank
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 143 - 148
  • [29] Continual learning of context-dependent processing in neural networks
    Zeng, Guanxiong
    Chen, Yang
    Cui, Bo
    Yu, Shan
    NATURE MACHINE INTELLIGENCE, 2019, 1 (08) : 364 - 372
  • [30] REGULARIZATION OF CONTEXT-DEPENDENT DEEP NEURAL NETWORKS WITH CONTEXT-INDEPENDENT MULTI-TASK TRAINING
    Bell, Peter
    Renals, Steve
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4290 - 4294