Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition

被引:12
|
作者
Wang, Guangsen [1 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Sch Comp, Singapore 117417, Singapore
基金
新加坡国家研究基金会;
关键词
Articulatory features; context dependent modeling; deep neural network; logistic regression; HIDDEN MARKOV-MODELS;
D O I
10.1109/TASLP.2014.2344855
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The data sparsity problem is addressed by using the decision tree state clusters as the training targets for the state-of-the-art context-dependent (CD) deep neural network (DNN) systems. The CD states within a cluster cannot be distinguished at the frame level. We surmise that the state clustering may cause an issue for the standard CD-DNNs, which has so far not been addressed in the literature. In this paper, a logistic regression framework is proposed for the CD-DNNs based on a set of broad phone classes to address both the data sparsity and the clustering problems. To address the data sparsity issue, the triphones are clustered into shorter biphones with broad phone contexts under multiple articulatory categories. A DNN is trained to discriminate the disjoint biphone clusters within each articulatory category. The regression bases are formed by the concatenated log posterior probabilities of all the broad phone DNNs. Logistic regression is used to transform the regression bases into the triphone state posteriors. Clustering of the regression parameters is used to reduce the regression model complexity while still achieving unique acoustic scores for all possible triphones. Based on some approximations, the regression model can be trained as a sparse softmax layer and its parameters can be learned by optimizing the cross-entropy criterion. The experimental results on a broadcast news transcription task reveal that the proposed regression-based CD-DNN significantly outperforms the standard CD-DNN. The best system provides a 1.3% absolute word error rate reduction compared to the best standard CD-DNN system.
引用
收藏
页码:1660 / 1669
页数:10
相关论文
共 50 条
  • [41] Deep Segmental Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Deng, Li
    Yu, Dong
    Jiang, Hui
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852
  • [42] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
    Graves, Alex
    Mohamed, Abdel-rahman
    Hinton, Geoffrey
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
  • [43] Binary Deep Neural Networks for Speech Recognition
    Xiang, Xu
    Qian, Yanmin
    Yu, Kai
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 533 - 537
  • [44] Context-dependent emotion recognition
    Wang, Zili
    Lao, Lingjie
    Zhang, Xiaoya
    Li, Yong
    Zhang, Tong
    Cui, Zhen
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 89
  • [45] INFORMATIVE DIALECT RECOGNITION USING CONTEXT-DEPENDENT PRONUNCIATION MODELING
    Chen, Nancy F.
    Shen, Wade
    Campbell, Joseph P.
    Torres-Carrasquillo, Pedro A.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4396 - 4399
  • [46] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [47] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
    Du, Jun
    Wang, Qing
    Gao, Tian
    Xu, Yong
    Dai, Lirong
    Lee, Chin-Hui
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
  • [48] Exploiting deep neural networks for detection-based speech recognition
    Siniscalchi, Sabato Marco
    Yu, Dong
    Deng, Li
    Lee, Chin-Hui
    NEUROCOMPUTING, 2013, 106 : 148 - 157
  • [49] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Niu, Yafeng
    Zou, Dongsheng
    Niu, Yadong
    He, Zhongshi
    Tan, Hua
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
  • [50] Context-Dependent Computations in Spiking Neural Networks with Apical Modulation
    Ferrand, Romain
    Baronig, Maximilian
    Limbacher, Thomas
    Legenstein, Robert
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT I, 2023, 14254 : 381 - 392