Subspace Based Sequence Discriminative Training of LSTM Acoustic Models with Feed-Forward Layers

被引:0
|
作者
Samarakoon, Lahiru [1 ]
Mak, Brian [2 ]
Lam, Albert Y. S. [1 ]
机构
[1] Fano Labs, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
关键词
Long Short-Term memory (LSTM); Recurrent Neural Networks (RNNs); Sequence Discriminative Training; Acoustic Modeling; NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art automatic speech recognition (ASR) systems use sequence discriminative training for improved performance over frame-level cross-entropy (CE) criterion. Even though sequence discriminative training improves long short-term memory (LSTM) recurrent neural network (RNN) acoustic models (AMs), it is not clear whether these systems achieve the optimal performance due to overfitting. This paper investigates the effect of state-level minimum Bayes risk (sMBR) training on LSTM AMs and shows that the conventional way of performing sMBR by updating all LSTM parameters is not optimal. We investigate two methods to improve the performance of sequence discriminative training of LSTM AMs. First more feedforward (FF) layers are included between the last LSTM layer and the output layer so those additional FF layers may benefit more from sMBR training. Second, a subspace is estimated as an interpolation of rank-1 matrices when performing sMBR for the LSTM layers of the AM. Our methods are evaluated in benchmark AMI single distance microphone (SDM) task. We find that the proposed approaches provide 1.6% absolute improvement over a strong sMBR trained LSTM baseline.
引用
收藏
页码:136 / 140
页数:5
相关论文
共 50 条
  • [31] Training algorithm with incomplete data for feed-forward neural networks
    Yoon, SY
    Lee, SY
    NEURAL PROCESSING LETTERS, 1999, 10 (03) : 171 - 179
  • [32] Discovery of Optimal Neurons and Hidden Layers in Feed-Forward Neural Network
    Thomas, Likewin
    Kumar, Manoj M., V
    Annappa, B.
    2016 IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND INNOVATIVE BUSINESS PRACTICES FOR THE TRANSFORMATION OF SOCIETIES (EMERGITECH), 2016, : 286 - 291
  • [33] Filtering Training Data When Training Feed-Forward Artificial Neural Network
    Moniz, Krishna
    Yuan, Yuyu
    TRUSTWORTHY COMPUTING AND SERVICES, 2014, 426 : 212 - 218
  • [34] Discriminative training of acoustic models for system combination
    Tachioka, Yuuki
    Watanabe, Shinji
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2354 - 2358
  • [35] The role of nonlinearities in hierarchical feed-forward models for pattern recognition
    Eberhardt, S.
    Kluth, T.
    Fahle, M.
    Zetzsche, C.
    PERCEPTION, 2012, 41 : 241 - 241
  • [36] Flat acoustic sources with frequency response correction based on feedback and feed-forward distributed control
    Ho, Jen-Hsuan
    Berkhoff, Arthur P.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 137 (04): : 2080 - 2088
  • [37] A variable weighting based training data selection method for discriminative training of acoustic models
    Chen, Bin
    Niu, Tong
    Zhang, Lian-Hai
    Li, Bi-Cheng
    Qu, Dan
    Zidonghua Xuebao/Acta Automatica Sinica, 2014, 40 (12): : 2899 - 2907
  • [38] Training data selection for improving discriminative training of acoustic models
    Liu, Shih-Hung
    Chu, Fang-Hui
    Lin, Shih-Hsiang
    Lee, Hung-Shin
    Chen, Berlin
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 284 - 289
  • [39] Training data selection for improving discriminative training of acoustic models
    Chen, Berlin
    Liu, Shih-Hung
    Chu, Fang-Hui
    PATTERN RECOGNITION LETTERS, 2009, 30 (13) : 1228 - 1235
  • [40] On the design of feed-forward ΣΔ interface for MEMS based accelerometers
    Wenquan Liang
    Changchun Yang
    Donghai Qiao
    Analog Integrated Circuits and Signal Processing, 2012, 72 : 3 - 10