DEEP-FSMN FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

被引:0
|
作者
Zhang, Shiliang [1 ]
Lei, Ming [1 ]
Yan, Zhijie [1 ]
Dai, Lirong [2 ]
机构
[1] Alibaba Inc, Hangzhou, Zhejiang, Peoples R China
[2] USTC, NELSLIP, Hefei, Anhui, Peoples R China
关键词
DFSMN; FSMN; LFR; LVCSR; BLSTM; NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present an improved feedforward sequential memory networks (FSMN) architecture, namely Deep-FSMN (DFSMN), by introducing skip connections between memory blocks in adjacent layers. These skip connections enable the information flow across different layers and thus alleviate the gradient vanishing problem when building very deep structure. As a result, DFSMN significantly benefits from these skip connections and deep structure. We have compared the performance of DFSMN to BLSTM both with and without lower frame rate (LFR) on several large speech recognition tasks, including English and Mandarin. Experimental results shown that DFSMN can consistently outperform BLSTM with dramatic gain, especially trained with LFR using CD-Phone as modeling units. In the 20000 hours Fisher (FSH) task, the proposed DFSMN can achieve a word error rate of 9.4% by purely using the cross-entropy criterion and decoding with a 3-gram language model, which achieves a 1.5% absolute improvement compared to the BLSTM. In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20% relative improvement compared to the LFR trained BLSTM. Moreover, we can easily design the lookahead filter order of the memory blocks in DFSMN to control the latency for real-time applications.
引用
收藏
页码:5869 / 5873
页数:5
相关论文
共 50 条
  • [41] A Detailed Survey on Large Vocabulary Continuous Speech Recognition Techniques
    Vanajakshi, P.
    Mathivanan, M.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [42] Speaker selection training for large vocabulary continuous speech recognition
    Huang, C
    Chen, T
    Chang, E
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 609 - 612
  • [43] IMPROVEMENTS ON BOTTLENECK FEATURE FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Tuerxun, Maimaitiaili
    Zhang, Shiliang
    Bao, Yebo
    Dai, Lirong
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 516 - 520
  • [44] A LAYERED APPROACH FOR DUTCH LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Pelemans, Joris
    Demuynck, Kris
    Wambacq, Patrick
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4421 - 4424
  • [45] JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research
    Itou, Katunobu
    Yamamoto, Mikio
    Takeda, Kazuya
    Takezawa, Toshiyuki
    Matsuoka, Tatsuo
    Kobayashi, Tetsunori
    Shikano, Kiyohiro
    Itahashi, Shuichi
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (03): : 199 - 206
  • [46] An unsupervised adaptation method for deep neural network-based large vocabulary continuous speech recognition
    Xiao, Yeming
    Si, Yujing
    Xu, Ji
    Pan, Jielin
    Yan, Yonghong
    Journal of Information and Computational Science, 2014, 11 (14): : 4889 - 4899
  • [47] A CLUSTER-BASED MULTIPLE DEEP NEURAL NETWORKS METHOD FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Zhou, Pan
    Liu, Cong
    Liu, Qingfeng
    Dai, Lirong
    Jiang, Hui
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6650 - 6654
  • [48] Visual information assisted mandarin large vocabulary continuous speech recognition
    Liu, P
    Wang, ZY
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 72 - 77
  • [49] Integrating induced probability into decoding for large vocabulary continuous speech recognition
    Yang, Zhanlei
    Liu, Wenju
    Chao, Hao
    Shengxue Xuebao/Acta Acustica, 2012, 37 (02): : 209 - 217
  • [50] An efficient search space representation for large vocabulary continuous speech recognition
    Demuynck, K
    Duchateau, J
    Van Compernolle, D
    Wambacq, P
    SPEECH COMMUNICATION, 2000, 30 (01) : 37 - 53