Subspace Based Sequence Discriminative Training of LSTM Acoustic Models with Feed-Forward Layers

被引:0
|
作者
Samarakoon, Lahiru [1 ]
Mak, Brian [2 ]
Lam, Albert Y. S. [1 ]
机构
[1] Fano Labs, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
关键词
Long Short-Term memory (LSTM); Recurrent Neural Networks (RNNs); Sequence Discriminative Training; Acoustic Modeling; NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art automatic speech recognition (ASR) systems use sequence discriminative training for improved performance over frame-level cross-entropy (CE) criterion. Even though sequence discriminative training improves long short-term memory (LSTM) recurrent neural network (RNN) acoustic models (AMs), it is not clear whether these systems achieve the optimal performance due to overfitting. This paper investigates the effect of state-level minimum Bayes risk (sMBR) training on LSTM AMs and shows that the conventional way of performing sMBR by updating all LSTM parameters is not optimal. We investigate two methods to improve the performance of sequence discriminative training of LSTM AMs. First more feedforward (FF) layers are included between the last LSTM layer and the output layer so those additional FF layers may benefit more from sMBR training. Second, a subspace is estimated as an interpolation of rank-1 matrices when performing sMBR for the LSTM layers of the AM. Our methods are evaluated in benchmark AMI single distance microphone (SDM) task. We find that the proposed approaches provide 1.6% absolute improvement over a strong sMBR trained LSTM baseline.
引用
收藏
页码:136 / 140
页数:5
相关论文
共 50 条
  • [1] Dynamic Feed-Forward LSTM
    Piao, Chengkai
    Wang, Yuchen
    Wei, Jinmao
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2023, 2023, 14117 : 191 - 202
  • [2] Leveraging Unlabeled Speech for Sequence Discriminative Training of Acoustic Models
    Sapru, Ashtosh
    Garimella, Sri
    INTERSPEECH 2020, 2020, : 3585 - 3589
  • [3] Feed-forward control for a lithography/etch sequence
    Öchsner, R
    Tschaftary, T
    Sommer, S
    Pfitzner, L
    Ryssel, H
    Gerath, H
    Baier, C
    Hafner, M
    PROCESS CONTROL AND DIAGNOSTICS, 2000, 4182 : 31 - 39
  • [4] Discriminative training of feed-forward and recurrent sum-product networks by extended Baum-Welch
    Duan, Haonan
    Rashwan, Abdullah
    Poupart, Pascal
    Chen, Zhitang
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2020, 124 : 66 - 81
  • [5] MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
    Zhang, Zhengyan
    Lin, Yankai
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 877 - 890
  • [6] Kformer: Knowledge Injection in Transformer Feed-Forward Layers
    Yao, Yunzhi
    Huang, Shaohan
    Dong, Li
    Wei, Furu
    Chen, Huajun
    Zhang, Ningyu
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 131 - 143
  • [7] Toward the training of feed-forward neural networks with the D-optimum input sequence
    Witczak, M
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2006, 17 (02): : 357 - 373
  • [8] Comparison of feed-forward, cascade-forward, and Elman algorithms models for determination of the elastic modulus of pavement layers
    Elshamy, Mohamed M. M.
    Artem, N. Tiraturyan
    Evgeniya, V. Uglova
    Elgendy, Mohamed Z.
    PROCEEDINGS OF THE 2021 4TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS AND DATA ANALYSIS, ICGDA 2021, 2021, : 46 - 53
  • [9] SEQUENCE-TO-SEQUENCE SINGING SYNTHESIS USING THE FEED-FORWARD TRANSFORMER
    Blaauw, Merlijn
    Bonada, Jordi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7229 - 7233
  • [10] An improved training method for feed-forward neural networks
    Lendl, M
    Unbehauen, R
    CLASSIFICATION IN THE INFORMATION AGE, 1999, : 320 - 327