DEEP-FSMN FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

被引:0
|
作者
Zhang, Shiliang [1 ]
Lei, Ming [1 ]
Yan, Zhijie [1 ]
Dai, Lirong [2 ]
机构
[1] Alibaba Inc, Hangzhou, Zhejiang, Peoples R China
[2] USTC, NELSLIP, Hefei, Anhui, Peoples R China
关键词
DFSMN; FSMN; LFR; LVCSR; BLSTM; NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present an improved feedforward sequential memory networks (FSMN) architecture, namely Deep-FSMN (DFSMN), by introducing skip connections between memory blocks in adjacent layers. These skip connections enable the information flow across different layers and thus alleviate the gradient vanishing problem when building very deep structure. As a result, DFSMN significantly benefits from these skip connections and deep structure. We have compared the performance of DFSMN to BLSTM both with and without lower frame rate (LFR) on several large speech recognition tasks, including English and Mandarin. Experimental results shown that DFSMN can consistently outperform BLSTM with dramatic gain, especially trained with LFR using CD-Phone as modeling units. In the 20000 hours Fisher (FSH) task, the proposed DFSMN can achieve a word error rate of 9.4% by purely using the cross-entropy criterion and decoding with a 3-gram language model, which achieves a 1.5% absolute improvement compared to the BLSTM. In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20% relative improvement compared to the LFR trained BLSTM. Moreover, we can easily design the lookahead filter order of the memory blocks in DFSMN to control the latency for real-time applications.
引用
收藏
页码:5869 / 5873
页数:5
相关论文
共 50 条
  • [31] On designing pronunciation lexicons for large vocabulary, continuous speech recognition
    Lamel, L
    Adda, G
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 6 - 9
  • [32] Large-vocabulary continuous speech recognition: Advances and applications
    Gauvain, JL
    Lamel, L
    PROCEEDINGS OF THE IEEE, 2000, 88 (08) : 1181 - 1200
  • [33] A large-vocabulary continuous speech recognition system for Hindi
    Kumar, M
    Rajput, N
    Verma, A
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2004, 48 (5-6) : 703 - 715
  • [34] Large-Vocabulary Continuous Speech Recognition of Lhasa Tibetan
    Li, Guanyu
    Yu, Hongzhi
    COMPUTER AND INFORMATION TECHNOLOGY, 2014, 519-520 : 802 - 806
  • [35] An overview of decoding techniques for large vocabulary continuous speech recognition
    Aubert, XL
    COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01): : 89 - 114
  • [36] Phone deactivation pruning in large vocabulary continuous speech recognition
    Renals, S
    IEEE SIGNAL PROCESSING LETTERS, 1996, 3 (01) : 4 - 6
  • [37] Speaker verification through large vocabulary continuous speech recognition
    Newman, M
    Gillick, L
    Ito, Y
    McAllaster, D
    Peskin, B
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2419 - 2422
  • [38] Connectionist language modeling for large vocabulary continuous speech recognition
    Schwenk, H
    Gauvain, JL
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 765 - 768
  • [39] Syllable-based large vocabulary continuous speech recognition
    Ganapathiraju, A
    Hamaker, J
    Picone, J
    Ordowski, M
    Doddington, GR
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 358 - 366
  • [40] Integrating Stress Information in Large Vocabulary Continuous Speech Recognition
    Ludusan, Bogdan
    Ziegler, Stefan
    Gravier, Guillaume
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2641 - 2644