Vectorized Beam Search for CTC-Attention-based Speech Recognition

被引:19
|
作者
Seki, Hiroshi [1 ]
Hori, Takaaki [2 ]
Watanabe, Shinji [3 ]
Moritz, Niko [2 ]
Le Roux, Jonathan [2 ]
机构
[1] Toyohashi Univ Technol, Toyohashi, Aichi, Japan
[2] Mitsubishi Elect Res Labs MERL, Cambridge, MA USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
来源
关键词
speech recognition; beam search; parallel computing; encoder-decoder network; GPU;
D O I
10.21437/Interspeech.2019-2860
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper investigates efficient beam search techniques for end-to-end automatic speech recognition (ASR) with attention-based encoder-decoder architecture. We accelerate the decoding process by vectorizing multiple hypotheses during the beam search, where we replace the score accumulation steps for each hypothesis with vector-matrix operations for the vectorized hypotheses. This modification allows us to take advantage of the parallel computing capabilities of multi-core CPUs and GPUs, resulting in significant speedups and also enabling us to process multiple utterances in a batch simultaneously. Moreover, we extend the decoding method to incorporate a recurrent neural network language model (RNNLM) and connectionist temporal classification (CTC) scores, which typically improve ASR accuracy but have not been investigated for the use of such parallelized decoding algorithms. Experiments with LibriSpeech and Corpus of Spontaneous Japanese datasets have demonstrated that the vectorized beam search achieves 1.8x speedup on a CPU and 33x speedup on a GPU compared with the original CPU implementation. When using joint CTC/attention decoding with an RNNLM, we also achieved 11x speedup on a GPU while maintaining the benefits of CTC and RNNLM. With these benefits, we achieved almost real-time processing with a small latency of 0.1 x real-time without streaming search process.
引用
收藏
页码:3825 / 3829
页数:5
相关论文
共 50 条
  • [41] Improvements of Search Error Risk Minimization in Viterbi Beam Search for Speech Recognition
    Hori, Takaaki
    Watanabe, Shinji
    Nakamura, Atsushi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1962 - 1965
  • [42] Language Adaptive Multilingual CTC Speech Recognition
    Mueller, Markus
    Stueker, Sebastian
    Waibel, Alex
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 473 - 482
  • [43] Speech Recognition via CTC-CNNModel
    Sung, Wen-Tsai
    Kang, Hao-Wei
    Hsiao, Sung-Jung
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (03): : 3833 - 3858
  • [44] Phone Synchronous Speech Recognition With CTC Lattices
    Chen, Zhehuai
    Zhuang, Yimeng
    Qian, Yanmin
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 90 - 101
  • [45] Attention-Based Models for Speech Recognition
    Chorowski, Jan
    Bahdanau, Dzmitry
    Serdyuk, Dmitriy
    Cho, Kyunghyun
    Bengio, Yoshua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [46] Competitive crow search algorithm-based hierarchical attention network for dysarthric speech recognition
    Jolad B.
    Khanai R.
    International Journal of Wireless and Mobile Computing, 2023, 25 (04) : 340 - 352
  • [47] IMPROVING HYBRID CTC/ATTENTION END-TO-END SPEECH RECOGNITION WITH PRETRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 76 - 82
  • [48] Beam search pruning in speech recognition using a posterior probability-based confidence measure
    Abdou, S
    Scordilis, MS
    SPEECH COMMUNICATION, 2004, 42 (3-4) : 409 - 428
  • [49] Efficient Conformer-Based CTC Model for Intelligent Cockpit Speech Recognition
    Guo, Hanzhi
    Chen, Yunshu
    Xie, Xukang
    Xu, Gaopeng
    Guo, Wei
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 522 - 526
  • [50] KEY ACTION AND JOINT CTC-ATTENTION BASED SIGN LANGUAGE RECOGNITION
    Li, Haibo
    Gao, Liqing
    Han, Ruize
    Wan, Liang
    Feng, Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2348 - 2352