Vectorized Beam Search for CTC-Attention-based Speech Recognition

被引：19

作者：

Seki, Hiroshi ^{[1
]}

Hori, Takaaki ^{[2
]}

Watanabe, Shinji ^{[3
]}

Moritz, Niko ^{[2
]}

Le Roux, Jonathan ^{[2
]}

机构：

[1] Toyohashi Univ Technol, Toyohashi, Aichi, Japan

[2] Mitsubishi Elect Res Labs MERL, Cambridge, MA USA

[3] Johns Hopkins Univ, Baltimore, MD 21218 USA

来源：

INTERSPEECH 2019 | 2019年

关键词：

speech recognition; beam search; parallel computing; encoder-decoder network; GPU;

D O I：

10.21437/Interspeech.2019-2860

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper investigates efficient beam search techniques for end-to-end automatic speech recognition (ASR) with attention-based encoder-decoder architecture. We accelerate the decoding process by vectorizing multiple hypotheses during the beam search, where we replace the score accumulation steps for each hypothesis with vector-matrix operations for the vectorized hypotheses. This modification allows us to take advantage of the parallel computing capabilities of multi-core CPUs and GPUs, resulting in significant speedups and also enabling us to process multiple utterances in a batch simultaneously. Moreover, we extend the decoding method to incorporate a recurrent neural network language model (RNNLM) and connectionist temporal classification (CTC) scores, which typically improve ASR accuracy but have not been investigated for the use of such parallelized decoding algorithms. Experiments with LibriSpeech and Corpus of Spontaneous Japanese datasets have demonstrated that the vectorized beam search achieves 1.8x speedup on a CPU and 33x speedup on a GPU compared with the original CPU implementation. When using joint CTC/attention decoding with an RNNLM, we also achieved 11x speedup on a GPU while maintaining the benefits of CTC and RNNLM. With these benefits, we achieved almost real-time processing with a small latency of 0.1 x real-time without streaming search process.

引用

页码：3825 / 3829

页数：5

共 50 条

[41] Improvements of Search Error Risk Minimization in Viterbi Beam Search for Speech Recognition
Hori, Takaaki
Watanabe, Shinji
Nakamura, Atsushi
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1962 - 1965
[42] Language Adaptive Multilingual CTC Speech Recognition
Mueller, Markus
Stueker, Sebastian
Waibel, Alex
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 473 - 482
[43] Speech Recognition via CTC-CNNModel
Sung, Wen-Tsai
Kang, Hao-Wei
Hsiao, Sung-Jung
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (03): : 3833 - 3858
[44] Phone Synchronous Speech Recognition With CTC Lattices
Chen, Zhehuai
Zhuang, Yimeng
Qian, Yanmin
Yu, Kai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 90 - 101
[45] Attention-Based Models for Speech Recognition
Chorowski, Jan
Bahdanau, Dzmitry
Serdyuk, Dmitriy
Cho, Kyunghyun
Bengio, Yoshua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[46] Competitive crow search algorithm-based hierarchical attention network for dysarthric speech recognition
Jolad B.
Khanai R.
International Journal of Wireless and Mobile Computing, 2023, 25 (04) : 340 - 352
[47] IMPROVING HYBRID CTC/ATTENTION END-TO-END SPEECH RECOGNITION WITH PRETRAINED ACOUSTIC AND LANGUAGE MODELS
Deng, Keqi
Cao, Songjun
Zhang, Yike
Ma, Long
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 76 - 82
[48] Beam search pruning in speech recognition using a posterior probability-based confidence measure
Abdou, S
Scordilis, MS
SPEECH COMMUNICATION, 2004, 42 (3-4) : 409 - 428
[49] Efficient Conformer-Based CTC Model for Intelligent Cockpit Speech Recognition
Guo, Hanzhi
Chen, Yunshu
Xie, Xukang
Xu, Gaopeng
Guo, Wei
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 522 - 526
[50] KEY ACTION AND JOINT CTC-ATTENTION BASED SIGN LANGUAGE RECOGNITION
Li, Haibo
Gao, Liqing
Han, Ruize
Wan, Liang
Feng, Wei
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2348 - 2352

← 1 2 3 4 5 →